Você está na página 1de 3

CS 11 MACHINE PROBLEM 2

DUE DATE: MARCH 17, 2014, 12 mn

GENERAL INSTRUCTIONS

1) Your submission should be a zipped file named CS 11 <section> MP 2 – <your name>.zip and submitted
via e-mail to janmichaelyap@gmail.com. The contents should be your source codes (.c files) and a PDF
file containing the external documentation of your codes.

2) The source code should also contain documentations herewith. An undocumented or poorly documented
source code may incur demerits.

3) Apart from your source code and the by-products, an external documentation should also included. The
external documentation should contain the following sections:
● Implementation – In this section, you are to describe how the solution to the problem is
implemented. You may use plain declarative paragraphs, a pseudocode (NOT your source
code), or a flowchart depicting the algorithm / procedure you used for the problem.
● Function Description – In formulating a program, you may have created some functions to
implement some parts of the code. In this section, list down all the functions created and a brief
description of what each does.
● Resources (Optional) – If you have consulted books, journals, web pages, and/or resource
persons (IF ANY) in coming up with the program, kindly cite them in this section. This section is
optional, that is, this section maybe omitted if you have independently, and without any external
assistance, formulated the solution.
● User Manual – A brief tutorial on what your program can do and how to go about with your
program.

4) The criteria for grading your MP submission is as follows:


● Functionality (60 pts.) - If a program works for all test inputs
● Documentation (30 pts.) - If your program is well documented, i.e. vital aspects of your program
have been properly described and explained
● Code Design (10 pts.) - If coding conventions (variable naming, indentations, etc.) are properly
followed

5) Assume correct input for all the problems.

6) Should there be any doubts, questions, and other feedbacks, email me at janmichaelyap@gmail.com or
consult me during class and / or consultation hours.

7) Any form of violations to intellectual property rights would be dealt with accordingly as stated in the
course requirements and rules section of the syllabus.
MACHINE PROBLEM TITLE: TEXT FILE COMPRESSION AND DECOMPRESSION

You are to create a program that performs compression and decompression of ASCII text files. This is similar to
certain applications like WinZIP, WinRAR, and 7zip that produce a compressed version of a file (e.g. .zip or .rar),
and then can reverse the process be decompressing the archive back to the original form of the file. The
(de)compression that you will be implementing is the Lempel-Ziv-Welch (LZW) algorithm. The following is a
description of how it works based from https://www.cs.duke.edu/csed/curious/compression/lzw.html:

Compression

LZW starts out with a dictionary of 256 (ASCII) characters and uses those as the "standard" character set. It then
reads data one character at a time and encodes the data as the number that represents its index in the
dictionary. Every time it comes across a new substring (say, "tr"), it adds it to the dictionary; every time it comes
across a substring it has already seen, it just reads in a new character and concatenates it with the current string
to get a new substring. The next time LZW revisits a substring, it will be encoded using a single number. Usually
a maximum number of entries is defined for the dictionary, so that the process doesn't run away with memory. It
is necessary for the codes to be longer in bits than the characters, but since many frequently occurring
substrings will be replaced by a single code, in the long haul, compression is achieved.

Here's what it might look like in pseudocode:

string s;
char ch;
...

s = empty string;
while (there is still data to be read) {
ch = read a character;
if (dictionary contains s+ch) {
s = s+ch;
} else {
encode s to output file;
add s+ch to dictionary;
s = ch;
}
}
encode s to output file;

Decompression

The decompression process for LZW works by (re)constructing a dictionary identical to the one created during
compression. Both encoding and decoding programs must start with the same initial dictionary, in this case, all
256 ASCII characters.

The LZW decoder works this way: it first reads in an index, looks up the index in the dictionary, and outputs the
substring associated with the index. The first character of this substring is concatenated to the current working
string. This new concatenation is added to the dictionary, similar to how substrings were added during
compression. The decoded string then becomes the current working string (the current index, I.e. the substring,
is remembered), and the process repeats.

Here is what it might look like in pseudocode:

string entry;
char ch;
int prevcode, currcode;
...

prevcode = read in a code;


decode/output prevcode;

while (there is still data to read) {


currcode = read in a code;
entry = translation of currcode from dictionary;
output entry;
ch = first char of entry;
add ((translation of prevcode)+ch) to dictionary;
prevcode = currcode;
}
For this program, assume that the files to be compressed are text files and have the extension name .txt, for
example sample.txt. Upon compression by your program, the resulting compressed file should have the filename
as the original file, but with extension name .cmp, for example sample.cmp. Decompressing a .cmp file should
produce a .txt file of the same filename. Decompression should also output the content of the decompressed file.

A required feature of the program, the compression and decompression should be done upon execution of the
program in the command line interface/prompt. This means that there are two additional parameters that should
typed along with the executable name: the work mode and the input file name. For the work mode, there are two
possible modes: compression mode denoted by -c and decompression mode denoted by -d. For the input file
name, it may include the path to that file to be (de)compressed. Make sure that the file to be (de)compressed is
existent, otherwise print a message stating that the file does not exist.

Sample run:

(Assume a poem.txt file exists and contains the poem in the fputs example in our File Input and Output lesson)

$ mp2 -c poem.txt
poem.txt compression done. poem.cmp file created.

$ mp2 -d poem.cmp
poem.cmp decompression done. The contents of the original file is as follows:
Roses are red,
Violets are blue,
Rhyming is overrated,
Zebra.
poem.txt recreated.

(Assume a poem2.txt file in the folder Jan inside the home folder is non-existent)

$ mp2 -c /home/Jan/poem2.txt
The file /home/Jan/poem2.txt does not exist! Terminating program...

Sidetask (+10 pts.): Upon compression, print out an additional message stating how many bytes the original
program was, how many bytes the compressed file is, and how much smaller, in terms of percentage, the
compressed file is compared to the original file. For example

$ mp2 -c poem.txt
poem.txt compression done. poem.cmp file created. poem.txt was 4096 bytes, while
poem.cmp is 3072 bytes. Resulting compressed file is 25% smaller than the original
one.

Você também pode gostar