Huffman Coding by Akas

HUFFMAN CODING 201
FEASIBILITY STUDY
In computer science and information theory, Huffman coding is an entropy
encoding algorithm used for lossless data compression. The term refers to
the use of a variable-length code table for encoding a source symbol (such
as a character in a file) where the variable-length code table has been
derived in a particular way based on the estimated probability of occurrence
for each possible value of the source symbol. It was developed by David A.
Huffman at MIT, and published in the 1952 paper "A Method for the
Construction of Minimum-Redundancy Codes".
Huffman coding uses a specific method for choosing the representation for
each symbol, resulting in a prefix code (sometimes called "prefix-free
codes") (that is, the bit string representing some particular symbol is never a
prefix of the bit string representing any other symbol) that expresses the
most common characters using shorter strings of bits than are used for less
common source symbols. Huffman was able to design the most efficient
compression method of this type: no other mapping of individual source
symbols to unique strings of bits will produce a smaller average output size
when the actual symbol frequencies agree with those used to create the
code
1.1 History:
In 1951, David A. Huffman and his MIT information theory classmates were
given the choice of a term paper or a final exam. The professor, Robert M.
Fano, assigned a term paper on the problem of finding the most efficient
binary code. Huffman, unable to prove any codes were the most efficient,
was about to give up and start studying for the final when he hit upon the
idea of using a frequency-sorted binary tree and quickly proved this method
the most efficient. In doing so, the student outdid his professor, who had
worked with information theory inventor Claude Shannon to develop a
similar code.
1.2 The algorithm:

1. Scan text to be compressed and tally occurrence of all characters.
2. Sort or prioritize characters based on number of occurrences in text.
3. Build Huffman code tree based on prioritized list.
B.E COMPUTER ENGINEERING Page 1

HUFFMAN CODING 201
0
4. Perform a traversal of tree to determine all code words.
5. Scan text again and create new file using the Huffman code.
The above construction algorithm uses a priority queue where the node with
lowest probability is given highest priority. This priority queue is used to
build the Huffman tree which assigns more frequent symbols less number
of bits and symbols that occur less frequently would take up more number of
bits. In other words assign shorter code words to more common bits and long
code words to less common bits and that would be ok because they are less
frequent so that the data compression is achieved.
1.3 Problem definition:
“To design the system that will allow the user to enter the total number of
characters with their frequencies at the terminal and then display the
Huffman codes on the terminal in an interactive manner. “
The main aim of the feasibility study activity is to determine whether it

would be financially, and technically feasible to develop the product. After
thoroughly analyzing the problem definition and Huffman coding algorithm
from various standard books on information theory and internet various
strategies for solving the problem were analyzed and finally the algorithm
based on priority queue (singly linked list) was chosen.
1.3.1 Technical feasibility
We need C++ compiler software and a computer for our Huffman algorithm
which is available. The personal involved with the project should be well
versed with the basic concepts of C, C++ and data structures.
1.3.2 Economical Feasibility:

HUFFMAN CODING 201
0
Economic feasibility is dependent on the complexity of the problem and
number of personnels involved with the project. Huffman coding is a moderate
problem and thus will need atleast 4 group members. Our team comprises of 4
engineering students who are familiar with data structures and thus is
economically feasible
1.3.3 Operational Feasibility:
This project is developed for information theory professionals and the

proposed system provides the fast and efficient operations. So it will be
acceptable to a large extent. So the proposed system is operationally feasible.
1.3.4 Time feasibility

Our Huffman project should be completed by the 12th of April 2010. This
time is feasible as per scheduling conducted by the project manager

HUFFMAN CODING 201
0
2 REQUIREMENT ANALYSIS AND SPECIFICATION

The aim of the requirements analysis and specification phase is to
understand the exact requirements of the customer and to document them
properly. This phase consists of two distinct activities:
2.1 Requirement gathering:
The goal of the requirement gathering activity is to collect all relevant

information regarding the product to be developed from the customer with a
view to clearly understand the customer requirements .To thoroughly
understand the problem, first a study of Graph theory was conducted using
standard text books (refer Bibliography) to understand the Huffman tree. The
the study of Huffman coding was taken up from various relevant sources like
internet and books on data compression. Finally, the method used by
Huffman coding to compress the data was understood to understand the
motive behind the project.
2.2 Requirement analysis:
The goal of requirement analysis activity is to weed out the incompleteness

and inconsistencies in the above gathered requirements. The data collected
from various sources including a group of users usually contain several
contradictions, ambiguities, incompleteness, inconsistencies, etc since each
user typically has only a partial and incomplete view of the system. In case
of the Huffman coding algorithm the main requirement is the frequency
distribution table which needs to be checked for contradictions, ambiguities,
and incompleteness.
For example, the user should not enter the same symbol twice resulting in
inconsistency. This would lead to errors as each symbol has only one

HUFFMAN CODING 201
0
Huffman code. if the user enters only one symbol this leads to
incompleteness which is removed by not using of algorithm and we can
conventionally assign 1 or 0 to the symbol. Similarly, the number of
symbols cant be greater than 94.

HUFFMAN CODING 201
0
ASCI He Symb ASCI He Symb ASCI He Symb ASCI He Symb

I x ol I x ol I x ol I x ol
0 0 NUL 16 10 DLE (spac 48 30 0
32 20
1 1 SOH 17 11 DC1 e) 49 31 1
33 21
2 2 STX 18 12 DC2 ! 50 32 2
34 22
3 3 ETX 19 13 DC3 " 51 33 3
35 23
4 4 EOT 20 14 DC4 # 52 34 4
36 24
5 5 ENQ 21 15 NAK $ 53 35 5
37 25
6 6 ACK 22 16 SYN % 54 36 6
38 26
7 7 BEL 23 17 ETB & 55 37 7
39 27
8 8 BS 24 18 CAN ' 56 38 8
40 28
9 9 TAB 25 19 EM ( 57 39 9
41 29
10 A LF 26 1A SUB ) 58 3A :
42 2A
11 B VT 27 1B ESC * 59 3B ;
43 2B
12 C FF 28 1C FS + 60 3C <
44 2C
13 D CR 29 1D GS , 61 3D =
45 2D
14 E SO 30 1E RS - 62 3E >
46 2E
15 F SI 31 1F US . 63 3F ?
47 2F
/
ASCI He Symb ASCI He Symb ASCI He Symb ASCI He Symb

I x ol I x ol I x ol I x ol
64 40 @ 80 50 P 96 60 ` 112 70 p
65 41 A 81 51 Q 97 61 a 113 71 q
66 42 B 82 52 R 98 62 b 114 72 r
67 43 C 83 53 S 99 63 c 115 73 s
68 44 D 84 54 T 100 64 d 116 74 t
69 45 E 85 55 U 101 65 e 117 75 u
70 46 F 86 56 V 102 66 f 118 76 v
71 47 G 87 57 W 103 67 g 119 77 w
72 48 H 88 58 X 104 68 h 120 78 x
73 49 I 89 59 Y 105 69 i 121 79 y
74 4A J 90 5A Z 106 6A j 122 7A z

HUFFMAN CODING 201
0
75 4B K 91 5B [ 107 6B k 123 7B {
76 4C L 92 5C \ 108 6C l 124 7C |
77 4D M 93 5D ] 109 6D m 125 7D }
78 4E N 94 5E ^ 110 6E n 126 7E ~
79 4F O 95 5F _ 111 6F o 127 7F
As can be seen from the above table, the number of valid symbols that can
be entered can’t be greater than 94, because the standard ascii uses 32
among 128 for control purposes.
This incompleteness needs to removed by prompting the proper error

message. The Huffman coding has important feature of being “prefix free”
i.e., none of the code words are prefix of any other codeword’s, so that
ambiguity will not occur among the Huffman code words because of the
prefix property and thus leads to correct decoding of the message .
2.3 SRS (software requirements specification):
The customer requirements identified during the requirements gathering and

analysis activity are organized into a SRS document. The important
components of this document are:
Purpose:
a) Data compression: -The main purpose of the software is generating

Huffman codes used for data compression.
b) Overview:- In computer science and information theory, Huffman

coding is an algorithm used for lossless data compression. The term refers to
the use of a variable-length code table for encoding a source symbol (such
as a character in a file) where the variable- length code table has been
derived in a particular way based on the estimated probability of occurrence
for each possible value of the source symbol. It was developed by David A.
Huffman at MIT, and published in the 1952 paper "A Method for the
Construction of Minimum-Redundancy Codes”. Huffman coding uses a

HUFFMAN CODING 201
0
specific method for choosing the representation for each symbol, resulting in
a prefix code that expresses the most common characters using shorter
strings of bits than are used for less common source symbols.
c) End users:-Information theory is a branch of applied mathematics

and electrical engineering involving quantification of information. It is the
study of ways in which information can be represented or encoded. The
Huffman Coding Software is meant for information theory professionals.
• Functional requirements:
Functional requirements describe the functions to be supported by the

system. Each function can be characterized by the input data, the processing
required on the input data and the output data to be produced. The main
functions to be supported are:-
1. The user provides symbol and its corresponding frequency as input.

Frequency has to be a positive integer.
2. The symbol and its corresponding frequency are inserted into the node of
priority queue.
3. Once we have inserted all the symbols and their corresponding

frequencies into the priority queue , we build the Huffman tree for the
symbols
4. Once the complete tree is created we determine the Huffman code of

each symbol by traversing the tree.
• Non functional requirements:
Hardware requirements: 16 MB RAM
Software requirements: Windows, UNIX or Macintosh OS and C++

compiler.

HUFFMAN CODING 201
0
Ways for executing on Unix or Mac
1. download gcc ;in unix type “sudo apt-get install build-essential”

at the terminal
2. Make minor changes to run on posix plateform i.e unix or mac eg

change clrscr with clear which is the corresponding command in
linux for same purpose.
3. The GNU Compiler Collection (usually shortened to GCC) is a

compiler system produced by the GNU Project supporting various
programming languages
4. GCC has been adopted as the standard compiler by most other

modern Unix-like computer operating systems, including
GNU/Linux, the BSD family and Mac OS X.
5. command for running on unix after the installation of gcc

components “g++ -Wall -W -Werror huff.cpp -o huff”

HUFFMAN CODING 201
0

HUFFMAN CODING 201
0
• Goals of implementation:
In computer science data compression or source coding is the process

of encoding information using fewer bits (or other information-bearing units)
than an unenclosed representation would use, through use of specific
encoding schemes.
As with any communication, compressed data communication only works
when both the sender and receiver of the information understand the
encoding scheme. For example, this text makes sense only if the receiver
understands that it is intended to be interpreted as characters representing

HUFFMAN CODING 201
0
the English language. Similarly, compressed data can only be understood if
the decoding method is known by the receiver.
Compression is useful because it helps reduce the consumption of expensive
resources, such as hard disk space or transmission bandwidth. On the
downside, compressed data must be decompressed to be used, and this
extra processing may be detrimental to some applications. For instance, a
compression scheme for video may require expensive hardware for the video
to be decompressed fast enough to be viewed as it's being decompressed
(the option of decompressing the video in full before watching it may be
inconvenient, and requires storage space for the decompressed video). The
design of data compression schemes therefore involves trade-offs among
various factors, including the degree of compression, the amount of
distortion introduced (if using a lossy compression scheme), and the
computational resources required to compress and uncompress the data.
The goal of the Huffman coding implementation is the data compression-
process of reducing the amount of data required to represent a given
quantity of information as illustrated in figure below:
A source generates 4 different symbols {a1,a2,a3,a4} with probability

{0.4;0.35;0.2;0.05}. A binary tree is generated from left to right taking the
two less probable symbols, putting them together to form another equivalent
symbol having a probability that equals the sum of the two symbols. The
process is repeated until there is just one symbol. The tree can then be read
backwards, from right to left, assigning different bits to different branches.
The final Huffman code is:
Symb Cod The standard way to represent a signal made of 4 symbols is by

ol e
using 2 bits/symbol. If this Huffman code is used to represent
the signal, then the average length is lowered to 1.85
a1 0 bits/symbol. The technique works by creating a binary tree of
nodes. Thus, on an average ,this achieves a saving of 0.15
bits/symbol and thus achieving data compression.
a2 10
In run-length encoding, large runs of consecutive identical
data values are replaced by a simple code with the data value
a3 110
a4 111
HUFFMAN CODING 201
0
and length of the run. This is an example of lossless data compression. It is
often used to optimize disk space on office computers, or better use the
connection bandwidth in a computer network. For symbolic data such as
spreadsheets, text, executable programs, etc., losslessness is essential
because changing even a single bit cannot be tolerated (except in some
limited cases).
For visual and audio data, some loss of quality can be tolerated without
losing the essential nature of the data. By taking advantage of the limitations
of the human sensory system, a great deal of space can be saved while
producing an output which is nearly indistinguishable from the original.
These lossy data compression methods typically offer a three-way tradeoff
between compression speed, compressed data size and quality loss.
Lossy image compression is used in digital cameras, to increase storage
capacities with minimal degradation of picture quality. Similarly, DVDs use
the lossy MPEG-2 Video codec for video compression.
In lossy audio compression, methods of psychoacoustics are used to remove
non-audible (or less audible) components of the signal. Compression of
human speech is often performed with even more specialized techniques, so
that "speech compression" or "voice coding" is sometimes distinguished as a
separate discipline from "audio compression". Different audio and speech
compression standards are listed under audio codecs. Voice compression is
used in Internet telephony for example, while audio compression is used for
CD ripping and is decoded by audio players.
Huffman coding today is often used as a "back-end" to some other

compression method. DEFLATE (PKZIP's algorithm) and multimedia codecs
such as JPEG and MP3 have a front-end model and quantization followed by
Huffman coding.
Example implementations:
• DEFLATE (a combination of LZ77 and Huffman coding) – used by ZIP,

gzip and PNG files
•
• Unix pack utility (the .z file format) used Huffman coding
• bzip2 (a combination of the Burrows-Wheeler transform and Huffman
coding)
• JPEG (image compression using a discrete cosine transform, then

quantization, then Huffman coding)
• MPEG (audio and video compression standards family in wide use,
using DCT and motion-compensated prediction for video)

HUFFMAN CODING 201
0
o MP3 (a part of the MPEG-1 standard for sound and music
compression, using subbanding and MDCT, perceptual modeling,
quantization, and Huffman coding)
o AAC (part of the MPEG-2 and MPEG-4 audio coding specifications,
using MDCT, perceptual modeling, quantization, and Huffman
coding)
DESIGN PROCEES FOR HUFFMAN CODING
During the software design phase, the designer transforms the SRS
document document into the design document. The design document
produced at the end of design phase should be implemented using a
programming language in the coding phase.
The items that are taken into consideration in the design phase are the
different modules which constitute it. Control relationships and interfaces
among different modules are identified. Suitable data structures for the data
to be stored need to be properly designed and documented.
One of the basic steps in the design process involves graphical

representation of our main problem. We use DFD’S for graphical
representation.
Context level DFD

HUFFMAN CODING 201
0
Huffman
Frequency Huffman codes

Coding
distribution table
The Huffman software takes the frequency distribution table as input and
computes the corresponding Huffman codes for each symbol.
Level 1 DFD
input

HUFFMAN CODING 201
0
Take
input
From
user 0.1 valid
Insert Priority
into queue
Priority node
Queue
0.2
Insert
into
tree
Composite node
node of tree symbo
0.3 l
Genera
te
codes
0.4 cod
e
In Level 1 DFD we provide character and its corresponding frequency as

input, if the input is valid then the input is inserted into the priority queue.
Once we have inserted all the symbols and their corresponding frequencies ,
we deque two nodes of lowest frequencies from the priority queue and form
the composite node of type tree which is then reinserted back in the priority

HUFFMAN CODING 201
0
queue if it is not empty. Once the complete tree is created we determine
the Huffman code of each symbol by traversing the tree .
Level 2 DFD (0.1)
input
Take no. of
symbols
0.1.1
valid
Enter the
symbol
0.1.2
valid
Enter
frequency
0.1.3
valid
In this DFD , we take the number of symbols as input , the user can enter
only integers from 1 to 94. If the user enters any other value e.g characters
then the error message is to be displayed and the user must enter the valid
input. After getting the valid number of characters the user must enter the
symbol for which the Huffman code is required. Here the user can enter the
symbols. After the valid input has been taken, we take the frequency of the
character. In this case we have to accept only the integer values; if the user
provides the input other than integers then error message is to be displayed.

HUFFMAN CODING 201
0
Level 2 DFD (0.2)
vali
d
Calculat
e
location
in Pque positio
0.2.1 n
Insert
node in
Composite node of Pque
tree 0.2.2 Priority queue
node
In this DFD, the proper location of the valid symbol or composite node is
determined & placed at the same location. In this case if the front of the
priority queue is null then the node is inserted at the beginning of the priority
queue else the node is placed at specific position depending on its
frequency.

HUFFMAN CODING 201
0
LEVEL 2 DFD (0.4)
symbol
Calculate
path from
root to leaf
path found
0.4.1
Traverse
path assigning
0 to left and 1
to right child
0.4.2
code
Here the input is the symbol for which we have to generate the Huffman
code. While traversing from root to leaf (symbol) we assign zero as we move
to left child and 1 as we move to right.

HUFFMAN CODING 201
0
Data Dictionary
Name Description
Composite node Tree or priority queue node
Tree Symbol,its frequency and pointers to left and right childern
Priority queue Symbol,its frequency and pointer to next node in the queue
Frequency distribution Symbols & their frequency
table
Huffman code Bit pattern used to represent a symbol
Root Root node of huffman tree
Leaf Symbols entered(having no children)
Path Unique sequence of edges from root to a symbol

HUFFMAN CODING 201
0
3.2 Structured design
3.2.1 Data design:
The main operations that the data structure must support are as follows:
- It must represent a binary Huffman tree with nonnegative frequencies.
- It must store a list of internal priority queue nodes in non decreasing

order.
Huffman algorithm consists of two main data structures:-
1. Priority queue:
Priority queue holds the symbol, frequency and the

pointer to the next node in the queue. T his datastructure is used
to create a node in the increasing order priority queue. The main
functions that manipulate this priority queue like finding position
for a node in the queue based on frequency, dequeing the first
node of the priority queue for composite node creation.
2. Binary tree:
Binary tree is a tree that holds symbol, frequency of the

character, and pointers to right and left children. This
datastructure is used to create a node of Huffman tree. Finally,
the Huffman tree is traversed to generate the Huffman code.

HUFFMAN CODING 201
0
3.2.2 Work break down structure (wbs):
MAIN
WELCOME INPUT INSERT DISPLAY ENCODE
ENQUE DEQUE
FP
The main module calls input module which in turn calls enque module which
inserts the nodes in the priority queue at proper position based on frequency

HUFFMAN CODING 201
0
by calling the find position module. Main module then calls the insert
module. Insert module calls the deque module which returns the lowest
priority node. This information is used to create a composite node of type
tree .The summation of the frequencies and concatenation of the symbols of
two nodes dequeued are then used as parameters to create a composite
node of type ‘priority queue’ into the queue if it is empty . The main
module may call the display module to display the symbols in the in-order
manner for debugging purposes. At last the encode module is used to
generate the Huffman codes for each symbol.
3.2.3 Procedural design
3.2.3.1 Flowcharts:
start
Enter frequency and

symbol
All
no symbol
s
entered
yes
Insert symbol in priority

queue in increasing order
B.E COMPUTER ENGINEERING

Build Huffman tree Page 23
HUFFMAN CODING 201
0
Traversal Huffman tree to

generate code
stop
Flowchart for main
start
Create a priority
queue
node
Find position
yes
Position Insert at
=Null beginning

HUFFMAN CODING 201
0
no
Insert at specific
position
Stop
Flowchat for enque
start
Pointer =front
Front=front->next
Return pointer
stop

HUFFMAN CODING 201
0
Flow chart for deque
start
yes
Root!=Null
Root= left child

HUFFMAN CODING 201
0
Display root
Root= right child
Flowchart for display
start
Front !=
stop
Null
no
yes
Dequeue two nodes of lowest

frequency
Create composite node of

tree
Adjust nodes of tree

HUFFMAN CODING 201
0
yes
Node has
children
no
Assign left & right

child
Priority Insert composite

yes
queue node into priority
empty no queue
stop
Flowchart for Insertion
start
no
Symbol !=
node
stop
yes
Print 0
yes Symbol
Node= left child
found in left
child

HUFFMAN CODING 201
0
no
Print 1
Node= right child
Flow chart for Encode

HUFFMAN CODING 201
0
3.2.3.2 Algorithms
Algorithm for main :-
STEP1: begin
Step2: take frequency & symbol as input.
STEP3: repeat step 2 until there is no input
Step4: insert symbols in priority queue in increasing order.
Step5: build Huffman tree.
Step6: traverse the tree to generate the Huffman code.
Step7: stop.
Algorithm for encode :-
STEP1: begin
Step2: while symbol != Node
if symbol found in left subtree
Print 0
Node = left child
Goto 2
Else
Print 1
Node = right child
Goto 2
Step3: stop.

HUFFMAN CODING 201
0
Algorithm for display:-
STEP1: begin
Step2: If root!= NULL
Root=left child
Display (root)
Display the symbol.
Root=right child
Display(root)
Step3: stop.
Algorithm for enque:-
Step1: Begin
Step2: Create node of priority queue.
STEP3: Find the position of the symbol in priority queue.
STEP4: Insert node at specific position.
STEP5: STOP

HUFFMAN CODING 201
0
Algorithm for deque:-
STEP1: Begin.
STEP2: return the node having the lowest frequency from priority queue.
Step 3: update front of queue.
Step4: stop.
Algorithm for insertion:-
STEP1: begin
Step2: Dequeue two nodes of lowest frequency.
STEP3: create composite nodes of tree.
Step4: adjust the tiny trees
Step5: If front!= Null
Step6: insert composite node into priority queue
Step7: stop.

HUFFMAN CODING 201
0
CODING
/* HUFFMAN CODING (Mini Project)*/
/*
Implemented By :
Asif Iqbal,Aquib Rahid Pandit,Kaleem Dar & Sahil Sholla
*/
/*header files:*/
#include<iostream.h>
#include<string.h>
#include<math.h>
#include<stdlib.h>
#include<conio.h>
#include<ctype.h>
#include<graphics.h>
/*Global declarations*/
int n;
char b[94][2];
/* Structure Specifications */

HUFFMAN CODING 201
0
/* Binary Tree Node Specification */
/* 'tree' is a Binary Tree, that holds symbol,its
frequency and pointers to left and right children */
struct tree
char a[94];
int s;
struct tree *left,*right;
}*root=NULL,*tt[47]={NULL},*temp,*temp2,*t2,*ri,*le;
/* Priority Queue Node Specification */
/* 'pqu'is a priority queue node that holds symbol,its
frequency and pointer to next node in the queue*/
struct pqu

HUFFMAN CODING 201
0
int info;
char a[94];
struct pqu *ptr;
}*front=NULL,*t,*par,*t1,*p1,*p2;
Code has
been hidden
for security
reasons
HUFFMAN CODING 201
0
//main program
void main()
int i;
welcome();
input();
insert();
//disp(root);
clrscr();
cout<<"\n\nThe corresponding codes are...\n\n";
for(i=0;i<n;i++)
cout<<b[i] << "==>";
encode(b[i]);
cout<<"\t";

HUFFMAN CODING 201
0
getch();
TESTING
Our Huffman coding software consists of different modules. An error in any

one of these modules will result in system error and if these errors are not
debugged it will result in defective system. This may result in rejection by
the customer at the customer testing phase, and thus will result in the
project failure. Thus testing is an important activity. We need not to carry out
unit testing as our program is simple with fewer modules. However we do
carry out phased testing of our code to verify that the code is working
properly. Various modules which perform the function of inputting the
frequency table, enquing the symbols into priority queue at their specific
position as per their frequency; this task is performed by find position (fp)
module; and then dequeinq the symbols from priority queues and inserting it
into the binary tree.
At this phase we include a debugging module named display to display the

symbols of complete binary tree by inorder traversal. If the result of display
module is not what is expected then the errors in the code were identified
and corrected. We have taken some conventions as specified in SRS
document. They are:
1. We assign 0 to the left child and 1 to the right child in the tree.
2. The position of a new composite node with same frequency as of those

symbol present in the priority queue, we place the one which enters the
priority queue next at high priority.
3. We assign symbol with higher frequency as left child & lower frequency as
right child in the tree.

HUFFMAN CODING 201
0
4. We assign 1 for any string or character in the case of only one character.
By taking all the above conventions into consideration we conducted

integration testing using phased approach because of the fewer number of
modules.
Test Case (0)
Input:
• Total number of symbols = 6.
• Frequency distribution table as:-
{ a=3,b=3,c=4,d=4,e=5,f=5 }
Expected output:
• ffeefebadcbbaabadcddcc
Result:
• Verification successful
Elaboration of above case
Iteration (1)
Priority queue
a= b= c= d= e=
front f=5
3 3 4 4 5
Processing:
for i=o, z=0
p1=address of (a) p2=address of (b)

HUFFMAN CODING 201
0
a1=a b1=b
roo temp
t
ba=6
tt[0]
temp2
b= a=
3 3
Nu Nu Nu
Nu ll ll ll
ll
Iteration (2)
Priority queue
front c= d= e=
f=5 ab=6
4 4 5
Processing:
for (i=0, z=1)
p1=address of (c) p2=address of (d)
a1=c b1=d

HUFFMAN CODING 201
0
temp
roo
t
dc=6
tt[1]
temp2
d= c=
4 4
Nu Nu Nu
Nu ll ll ll
ll
Iteration (3)
Priority queue
e=
front f=5
5 ba=6 dc=8
Processing:
for (i=0, z=2)
p1=address of (e) p2=address of (f)
a1=e b1=f

HUFFMAN CODING 201
0
temp
roo
t
fe=10
tt[2]
temp2
e=
f=5
5
Nu Nu Nu
Nu ll ll ll
ll
Iteration (4)
Priority queue
front ba=6 dc=8 fe=10
Processing:
for (i=0, z=3)
p1=address of (ab) p2=address of (cd)
a1=ab b1=cd

HUFFMAN CODING 201
0
temp
roo
t
badc
tt[2]
temp2
ba=6 dc=8
6666 5
b=3 a=3 d=4 c=4
nu nu nu nu nu nu nu nu
ll ll ll ll ll ll ll ll
Iteration (5)
Priority queue
front fe=10 badc=1

4
Processing:
for (i=0, z=4)
p1=address of (fe) p2=address of (badc)
a1=fe b1=badc

HUFFMAN CODING 201
0
febadc =24
badc=1
fe=10 4
ba=6 dc=8
f=5 e=5
6666 5
b=3 a=3 d=4 c=4
• The inorder traversal of the above tree yields

“ffeefebadcbbaabadcddcc”
Which was the expected output, thus the routines involved in creation of
priority queue and Huffman tree are verified to be correct. This completes
the phased integration testing .

HUFFMAN CODING 201
0

HUFFMAN CODING 201
0

HUFFMAN CODING 201
0
Once the phased testing was complete, all the other modules where
integrated to form the complete Huffman software. Then system testing was
conducted according to the following plan:
We first designed various test cases wit which the software was tested
with various test cases used:
CASE(1)
Input:
• no. of symbols = a
Expected = error message
Result: Success
For the test case(1) we incorporated certain lines of code which checked
whether the input; for number of symbols; is a number or not. This validation
was performed by using isdigit( )
function which is included in ctype library.

HUFFMAN CODING 201
0
CASE (2)
Input:
• no of symbols = 0
Result: success
We simply checked whether the input; for no. of characters; is 0, and display
a message to enter non-zero number and then continue the program to take
new input
CASE (3).
Input:
• no. of symbols = -1
Result: success
tn[0] contains negative sign and thus isdigit( ) will be false and message to
enter valid positive integers will again be displayed.
CASE(4).
Input:
• no. of symbols = 1
Expected=error message
Result: success
We don’t need to execute Huffman algorithm because it won’t be necessary

nor will it result in any data compression. Thus we simply exit the program.

HUFFMAN CODING 201
0
CASE (5).
input
enter symbol = asdd
Result: success
Strlen is used to first calculate the length of string entered and if it is more
than 1 message is displayed to enter one symbol only
CASE (6).
Input:
• no.of symbols = 98
Result: success
We check the condition if n>94 i.e. if the no. of symbols is specified more
than 94 we display the message to the user to enter numbers less than 94.
CASE (7).
Input
• enter symbol = aa

HUFFMAN CODING 201
0
Result: success
To eliminate redundancy or ambiguity we include a loop which runs for the

number of symbols, continuously comparing the newly input symbol with the
previous input symbols which are stored in b[ ], if the match is found then
message is displayed notifying the user that the newly input symbol is
already entered and current loop is exited.
After all the above validation we performed regression testing of all the
previous cases and verify the result. Now that all the cases behaved as
expected & thus system testing is complete.

HUFFMAN CODING 201
0

HUFFMAN CODING 201
0

HUFFMAN CODING 201
0

HUFFMAN CODING 201
0
Future scope:
The huffman coding the we have considered is simple binary Huffman coding
but many variations of Huffman coding exist,
1. n-ary Huffman coding:
The n-ary Huffman algorithm uses the {0, 1, ... , n − 1} alphabet to encode
message and build an n-ary tree. This approach was considered by Huffman
in his original paper. The same algorithm applies as for binary (n equals 2)
codes, except that the n least probable symbols are taken together, instead
of just the 2 least probable. Note that for n greater than 2, not all sets of
source words can properly form an n-ary tree for Huffman coding. In this
case, additional 0-probability place holders must be added. If the number of
source words is congruent to 1 modulo n-1, then the set of source words will
form a proper Huffman tree.
2.Adaptive Huffman coding:
A variation called adaptive Huffman coding calculates the probabilities

dynamically based on recent actual frequencies in the source string. This is
somewhat related to the LZ family of algorithms.
3.Huffman template algorithm:
Most often, the weights used in implementations of Huffman coding

represent numeric probabilities, but the algorithm given above does not
require this; it requires only a way to order weights and to add them. The
Huffman template algorithm enables one to use any kind of weights (costs,
frequencies etc)
4. Length-limited Huffman coding:
Length-limited Huffman coding is a variant where the goal is still to achieve a

minimum weighted path length, but there is an additional restriction that the
length of each codeword must be less than a given constant. The package-
merge algorithm solves this problem with a simple greedy approach very
similar to that used by Huffman's algorithm. Its time complexity is O(nL),
where L is the maximum length of a codeword. No algorithm is known to
solve this problem in linear or linearithmic time, unlike the presorted and
unsorted conventional Huffman problems, respectively.
5.Huffman coding with unequal letter costs:

HUFFMAN CODING 201
0
In the standard Huffman coding problem, it is assumed that each symbol in
the set that the code words are constructed from has an equal cost to
transmit: a code word whose length is N digits will always have a cost of N,
no matter how many of those digits are 0s, how many are 1s, etc. When
working under this assumption, minimizing the total cost of the message and
minimizing the total number of digits are the same thing.Huffman coding
with unequal letter costs is the generalization in which this assumption is no
longer assumed true: the letters of the encoding alphabet may have non-
uniform lengths, due to characteristics of the transmission medium. An
example is the encoding alphabet of Morse code, where a 'dash' takes longer
to send than a 'dot', and therefore the cost of a dash in transmission time is
higher. The goal is still to minimize the weighted average codeword length,
but it is no longer sufficient just to minimize the number of symbols used by
the message. No algorithm is known to solve this in the same manner or with
the same efficiency as conventional Huffman coding.
Moreover we can extend the range of the huffman coding software to

incorporate unicode whicb will require an interfacing module that will
interpret a perticular key in different languages based on the option
selected. In that case the maximum number of symbols will be of the order
of 232 .

Huffman Coding by Akas

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Huffman Coding by Akas

Enviado por

Direitos autorais:

Formatos disponíveis

HUFFMAN CODING 201

1.2 The algorithm:

2. Sort or prioritize characters based on number of occurrences in text.

3. Build Huffman code tree based on prioritized list.

B.E COMPUTER ENGINEERING Page 1

1.3 Problem definition:

The main aim of the feasibility study activity is to determine whether it

1.3.1 Technical feasibility

1.3.2 Economical Feasibility:

B.E COMPUTER ENGINEERING Page 2

1.3.3 Operational Feasibility:

This project is developed for information theory professionals and the

1.3.4 Time feasibility

B.E COMPUTER ENGINEERING Page 3

2 REQUIREMENT ANALYSIS AND SPECIFICATION

2.1 Requirement gathering:

The goal of the requirement gathering activity is to collect all relevant

2.2 Requirement analysis:

The goal of requirement analysis activity is to weed out the incompleteness

B.E COMPUTER ENGINEERING Page 4

B.E COMPUTER ENGINEERING Page 5

ASCI He Symb ASCI He Symb ASCI He Symb ASCI He Symb

ASCI He Symb ASCI He Symb ASCI He Symb ASCI He Symb

B.E COMPUTER ENGINEERING Page 6

This incompleteness needs to removed by prompting the proper error

2.3 SRS (software requirements specification):

The customer requirements identified during the requirements gathering and

a) Data compression: -The main purpose of the software is generating

b) Overview:- In computer science and information theory, Huffman

B.E COMPUTER ENGINEERING Page 7

c) End users:-Information theory is a branch of applied mathematics

Functional requirements describe the functions to be supported by the

1. The user provides symbol and its corresponding frequency as input.

3. Once we have inserted all the symbols and their corresponding

4. Once the complete tree is created we determine the Huffman code of

• Non functional requirements:

Hardware requirements: 16 MB RAM

Software requirements: Windows, UNIX or Macintosh OS and C++

B.E COMPUTER ENGINEERING Page 8

1. download gcc ;in unix type “sudo apt-get install build-essential”

2. Make minor changes to run on posix plateform i.e unix or mac eg

3. The GNU Compiler Collection (usually shortened to GCC) is a

4. GCC has been adopted as the standard compiler by most other

5. command for running on unix after the installation of gcc

B.E COMPUTER ENGINEERING Page 9

B.E COMPUTER ENGINEERING Page 10

In computer science data compression or source coding is the process

B.E COMPUTER ENGINEERING Page 11

A source generates 4 different symbols {a1,a2,a3,a4} with probability

Symb Cod The standard way to represent a signal made of 4 symbols is by

Huffman coding today is often used as a "back-end" to some other

• DEFLATE (a combination of LZ77 and Huffman coding) – used by ZIP,

• JPEG (image compression using a discrete cosine transform, then

B.E COMPUTER ENGINEERING Page 13

DESIGN PROCEES FOR HUFFMAN CODING

One of the basic steps in the design process involves graphical

Context level DFD

B.E COMPUTER ENGINEERING Page 14

Frequency Huffman codes

B.E COMPUTER ENGINEERING Page 15

In Level 1 DFD we provide character and its corresponding frequency as

B.E COMPUTER ENGINEERING Page 16

Level 2 DFD (0.1)

B.E COMPUTER ENGINEERING Page 17