Project 3, CSC 202
1 Huffman coding
2 Detailed Instructions
2.1 Count Occurrences
2.2 Data Definition for Huffman Tree
2.3 Build a Huffman Tree
2.4 Build a List for the Character Codes
2.5 Huffman Encoding
2.6 Huffman Decoding
3 2. Tests
4 3. Some Notes
5 4. Submission
6.9.0.6

Project 3, CSC 202

1 Huffman coding

For this project, you will implement a program to encode and decode text using Huffman coding . Huffman coding is a method of lossless data compression. Each character is assigned a variable length code consisting of ’0’s and ’1’s. The length of the code is based on how frequently the character occurs; more frequent characters are assigned shorter codes.

The basic idea is that you use a binary tree, where each leaf node represents a character and frequency count. The Huffman code of a character is then determined by the unique path from the root of the binary tree to that leaf node, where each ’left’ accounts for a ’0’ and a ’right’ accounts for a ’1’. In order to encode frequent characters with shorter codes, frequent characters should have a shorter path from the root to their node than infrequent characters, i.e. nodes representing frequent characters are positioned less deep in the tree than nodes for infrequent characters. The key problem is therefore to construct such a binary tree based on the frequency of occurrence of characters. A detailed example of how to construct such a Huffman tree is provided here , with the input text file and the resulting encoded file in ’0’ and ’1’ characters.

Your program will use your list implementations from lab 3, project 2, respectively. Copy both your array_list.py and linked_list.py file into your repository for project 3.

Note: For this assignment and every assignment in the class, every function must come with a signature and a purpose statement. You do not need to provide test cases for functions that produce output or consume input, but you must test (directly or indirectly) all functions that do not involve I/O (Input/Output), and you should make an attempt to design (or refactor) your code so that the functions that involve I/O are as small as possible.

2 Detailed Instructions

The following bullet points provide a guide to implement the individual functions of your program. Always follow the design recipe, write data definitions, function signatures and purpose statements in order to get full credit. Pick useful names for your defined data structures and functions, if there are no explicit names specified. If not otherwise stated, you will not get full credit if you use built-in functions. Start by creating a file huffman.py and add functions and data definitions to this file, if not otherwise stated.

2.1 Count Occurrences

2.2 Data Definition for Huffman Tree

2.3 Build a Huffman Tree

2.4 Build a List for the Character Codes

2.5 Huffman Encoding

2.6 Huffman Decoding

3 2. Tests

4 3. Some Notes

5 4. Submission

You must submit the following files:

 

 

  linked_list.py       — Linked List implementation

  linked_list_tests.py — Linked List test cases

  array_list.py        — Array List implementation

  array_list_tests.py  — Array List test cases

  huffman.py           — Your huffman encoder and decoder application including tests

  <Any text file you used for testing>

 

Note that the files you use for testing must end with .txt or .bin, or the handin server will cruelly destroy them.