Escolar Documentos
Profissional Documentos
Cultura Documentos
FEASIBILITY STUDY
In computer science and information theory, Huffman coding is an entropy
encoding algorithm used for lossless data compression. The term refers to
the use of a variable-length code table for encoding a source symbol (such
as a character in a file) where the variable-length code table has been
derived in a particular way based on the estimated probability of occurrence
for each possible value of the source symbol. It was developed by David A.
Huffman at MIT, and published in the 1952 paper "A Method for the
Construction of Minimum-Redundancy Codes".
Huffman coding uses a specific method for choosing the representation for
each symbol, resulting in a prefix code (sometimes called "prefix-free
codes") (that is, the bit string representing some particular symbol is never a
prefix of the bit string representing any other symbol) that expresses the
most common characters using shorter strings of bits than are used for less
common source symbols. Huffman was able to design the most efficient
compression method of this type: no other mapping of individual source
symbols to unique strings of bits will produce a smaller average output size
when the actual symbol frequencies agree with those used to create the
code
1.1 History:
In 1951, David A. Huffman and his MIT information theory classmates were
given the choice of a term paper or a final exam. The professor, Robert M.
Fano, assigned a term paper on the problem of finding the most efficient
binary code. Huffman, unable to prove any codes were the most efficient,
was about to give up and start studying for the final when he hit upon the
idea of using a frequency-sorted binary tree and quickly proved this method
the most efficient. In doing so, the student outdid his professor, who had
worked with information theory inventor Claude Shannon to develop a
similar code.
5. Scan text again and create new file using the Huffman code.
The above construction algorithm uses a priority queue where the node with
lowest probability is given highest priority. This priority queue is used to
build the Huffman tree which assigns more frequent symbols less number
of bits and symbols that occur less frequently would take up more number of
bits. In other words assign shorter code words to more common bits and long
code words to less common bits and that would be ok because they are less
frequent so that the data compression is achieved.
“To design the system that will allow the user to enter the total number of
characters with their frequencies at the terminal and then display the
Huffman codes on the terminal in an interactive manner. “
We need C++ compiler software and a computer for our Huffman algorithm
which is available. The personal involved with the project should be well
versed with the basic concepts of C, C++ and data structures.
For example, the user should not enter the same symbol twice resulting in
inconsistency. This would lead to errors as each symbol has only one
75 4B K 91 5B [ 107 6B k 123 7B {
76 4C L 92 5C \ 108 6C l 124 7C |
77 4D M 93 5D ] 109 6D m 125 7D }
78 4E N 94 5E ^ 110 6E n 126 7E ~
79 4F O 95 5F _ 111 6F o 127 7F
As can be seen from the above table, the number of valid symbols that can
be entered can’t be greater than 94, because the standard ascii uses 32
among 128 for control purposes.
Purpose:
• Functional requirements:
2. The symbol and its corresponding frequency are inserted into the node of
priority queue.
• Goals of implementation:
a4 111
HUFFMAN CODING 201
0
and length of the run. This is an example of lossless data compression. It is
often used to optimize disk space on office computers, or better use the
connection bandwidth in a computer network. For symbolic data such as
spreadsheets, text, executable programs, etc., losslessness is essential
because changing even a single bit cannot be tolerated (except in some
limited cases).
For visual and audio data, some loss of quality can be tolerated without
losing the essential nature of the data. By taking advantage of the limitations
of the human sensory system, a great deal of space can be saved while
producing an output which is nearly indistinguishable from the original.
These lossy data compression methods typically offer a three-way tradeoff
between compression speed, compressed data size and quality loss.
Lossy image compression is used in digital cameras, to increase storage
capacities with minimal degradation of picture quality. Similarly, DVDs use
the lossy MPEG-2 Video codec for video compression.
In lossy audio compression, methods of psychoacoustics are used to remove
non-audible (or less audible) components of the signal. Compression of
human speech is often performed with even more specialized techniques, so
that "speech compression" or "voice coding" is sometimes distinguished as a
separate discipline from "audio compression". Different audio and speech
compression standards are listed under audio codecs. Voice compression is
used in Internet telephony for example, while audio compression is used for
CD ripping and is decoded by audio players.
Example implementations:
During the software design phase, the designer transforms the SRS
document document into the design document. The design document
produced at the end of design phase should be implemented using a
programming language in the coding phase.
The items that are taken into consideration in the design phase are the
different modules which constitute it. Control relationships and interfaces
among different modules are identified. Suitable data structures for the data
to be stored need to be properly designed and documented.
Huffman
The Huffman software takes the frequency distribution table as input and
computes the corresponding Huffman codes for each symbol.
Level 1 DFD
input
Take
input
From
user 0.1 valid
Insert Priority
into queue
Priority node
Queue
0.2
Insert
into
tree
Composite node
node of tree symbo
0.3 l
Genera
te
codes
0.4 cod
e
input
Take no. of
symbols
0.1.1
valid
Enter the
symbol
0.1.2
valid
Enter
frequency
0.1.3
valid
In this DFD , we take the number of symbols as input , the user can enter
only integers from 1 to 94. If the user enters any other value e.g characters
then the error message is to be displayed and the user must enter the valid
input. After getting the valid number of characters the user must enter the
symbol for which the Huffman code is required. Here the user can enter the
symbols. After the valid input has been taken, we take the frequency of the
character. In this case we have to accept only the integer values; if the user
provides the input other than integers then error message is to be displayed.
vali
d
Calculat
e
location
in Pque positio
0.2.1 n
Insert
node in
Composite node of Pque
tree 0.2.2 Priority queue
node
In this DFD, the proper location of the valid symbol or composite node is
determined & placed at the same location. In this case if the front of the
priority queue is null then the node is inserted at the beginning of the priority
queue else the node is placed at specific position depending on its
frequency.
symbol
Calculate
path from
root to leaf
path found
0.4.1
Traverse
path assigning
0 to left and 1
to right child
0.4.2
code
Here the input is the symbol for which we have to generate the Huffman
code. While traversing from root to leaf (symbol) we assign zero as we move
to left child and 1 as we move to right.
Data Dictionary
Name Description
Composite node Tree or priority queue node
Tree Symbol,its frequency and pointers to left and right childern
Priority queue Symbol,its frequency and pointer to next node in the queue
Frequency distribution Symbols & their frequency
table
Huffman code Bit pattern used to represent a symbol
Root Root node of huffman tree
Leaf Symbols entered(having no children)
Path Unique sequence of edges from root to a symbol
The main operations that the data structure must support are as follows:
1. Priority queue:
2. Binary tree:
MAIN
ENQUE DEQUE
FP
The main module calls input module which in turn calls enque module which
inserts the nodes in the priority queue at proper position based on frequency
3.2.3.1 Flowcharts:
start
All
no symbol
s
entered
yes
stop
start
Create a priority
queue
node
Find position
yes
Position Insert at
=Null beginning
Insert at specific
position
Stop
start
Pointer =front
Front=front->next
Return pointer
stop
start
yes
Root!=Null
Display root
start
Front !=
stop
Null
no
yes
children
no
stop
start
no
Symbol !=
node
stop
yes
Print 0
yes Symbol
Node= left child
found in left
child
Print 1
3.2.3.2 Algorithms
STEP1: begin
Step7: stop.
STEP1: begin
Print 0
Goto 2
Else
Print 1
Goto 2
Step3: stop.
STEP1: begin
Root=left child
Display (root)
Root=right child
Display(root)
Step3: stop.
Step1: Begin
STEP5: STOP
STEP1: Begin.
STEP2: return the node having the lowest frequency from priority queue.
Step4: stop.
STEP1: begin
Step7: stop.
CODING
/* HUFFMAN CODING (Mini Project)*/
/*
Implemented By :
*/
/*header files:*/
#include<iostream.h>
#include<string.h>
#include<math.h>
#include<stdlib.h>
#include<conio.h>
#include<ctype.h>
#include<graphics.h>
/*Global declarations*/
int n;
char b[94][2];
/* Structure Specifications */
struct tree
char a[94];
int s;
}*root=NULL,*tt[47]={NULL},*temp,*temp2,*t2,*ri,*le;
struct pqu
int info;
char a[94];
}*front=NULL,*t,*par,*t1,*p1,*p2;
Code has
been hidden
for security
reasons
B.E COMPUTER ENGINEERING Page 35
HUFFMAN CODING 201
0
//main program
void main()
int i;
welcome();
input();
insert();
//disp(root);
clrscr();
for(i=0;i<n;i++)
encode(b[i]);
cout<<"\t";
TESTING
1. We assign 0 to the left child and 1 to the right child in the tree.
3. We assign symbol with higher frequency as left child & lower frequency as
right child in the tree.
Input:
{ a=3,b=3,c=4,d=4,e=5,f=5 }
Expected output:
• ffeefebadcbbaabadcddcc
Result:
• Verification successful
Iteration (1)
Priority queue
a= b= c= d= e=
front f=5
3 3 4 4 5
Processing:
roo temp
t
ba=6
tt[0]
temp2
b= a=
3 3
Nu Nu Nu
Nu ll ll ll
ll
Iteration (2)
Priority queue
front c= d= e=
f=5 ab=6
4 4 5
Processing:
a1=c b1=d
temp
roo
t
dc=6
tt[1]
temp2
d= c=
4 4
Nu Nu Nu
Nu ll ll ll
ll
Iteration (3)
Priority queue
e=
front f=5
5 ba=6 dc=8
Processing:
for (i=0, z=2)
a1=e b1=f
temp
roo
t
fe=10
tt[2]
temp2
e=
f=5
5
Nu Nu Nu
Nu ll ll ll
ll
Iteration (4)
Priority queue
Processing:
a1=ab b1=cd
temp
roo
t
badc
tt[2]
temp2
ba=6 dc=8
6666 5
nu nu nu nu nu nu nu nu
ll ll ll ll ll ll ll ll
Iteration (5)
Priority queue
Processing:
a1=fe b1=badc
febadc =24
badc=1
fe=10 4
ba=6 dc=8
f=5 e=5
6666 5
Which was the expected output, thus the routines involved in creation of
priority queue and Huffman tree are verified to be correct. This completes
the phased integration testing .
Once the phased testing was complete, all the other modules where
integrated to form the complete Huffman software. Then system testing was
conducted according to the following plan:
We first designed various test cases wit which the software was tested
with various test cases used:
CASE(1)
Input:
• no. of symbols = a
Result: Success
For the test case(1) we incorporated certain lines of code which checked
whether the input; for number of symbols; is a number or not. This validation
was performed by using isdigit( )
CASE (2)
Input:
• no of symbols = 0
Result: success
We simply checked whether the input; for no. of characters; is 0, and display
a message to enter non-zero number and then continue the program to take
new input
CASE (3).
Input:
• no. of symbols = -1
Result: success
tn[0] contains negative sign and thus isdigit( ) will be false and message to
enter valid positive integers will again be displayed.
CASE(4).
Input:
• no. of symbols = 1
Expected=error message
Result: success
CASE (5).
input
Result: success
Strlen is used to first calculate the length of string entered and if it is more
than 1 message is displayed to enter one symbol only
CASE (6).
Input:
• no.of symbols = 98
Result: success
We check the condition if n>94 i.e. if the no. of symbols is specified more
than 94 we display the message to the user to enter numbers less than 94.
CASE (7).
Input
• enter symbol = aa
Result: success
After all the above validation we performed regression testing of all the
previous cases and verify the result. Now that all the cases behaved as
expected & thus system testing is complete.
Future scope:
The huffman coding the we have considered is simple binary Huffman coding
but many variations of Huffman coding exist,
The n-ary Huffman algorithm uses the {0, 1, ... , n − 1} alphabet to encode
message and build an n-ary tree. This approach was considered by Huffman
in his original paper. The same algorithm applies as for binary (n equals 2)
codes, except that the n least probable symbols are taken together, instead
of just the 2 least probable. Note that for n greater than 2, not all sets of
source words can properly form an n-ary tree for Huffman coding. In this
case, additional 0-probability place holders must be added. If the number of
source words is congruent to 1 modulo n-1, then the set of source words will
form a proper Huffman tree.