Overview
Huffman Coding: It is a lossless data compression method which uses variable length character coding to encode data based on the probability or frequency of each character. Huffman coding uses prefix codes to encode data.
Prefix code: Assignment of codewords to characters such that no codeword is a prefix of another. Prefix codes can be described as a binary tree where the codewords are the leaf nodes of a tree and the left branch means“0”and the right branch means “1” as shown in figure below.
The prefix code can be decoded for a character by traversing from the root node to the node with that character. For example, in the figure above the prefix code of b can be obtained traversing from root to b. The prefix code is thus, 001.
Prerequisites:
● Heap tree
Procedure:
Let us consider the following characters and their frequencies as shown in table below,
To generate huffman codes,
(1) build the mean heap of characters based on their frequencies.
(2) Extract two minimum nodes from the heap
(3) Build the huffman tree by creating a new internal node with the first extracted node as its left child and second one as its right child such that the of the internal node is equal to the sum of the frequency of the child nodes.
(4) Finally, push the internal node into the min heap.
Step 1:The table represents the min heap in tabular format where d is the root node.
We extract the two minimum nodes with minimum frequency from the min heap which are cand d. We create a tree with a new internal node represented by a dummy character ‘$1’ such that frequency($1) = frequency(c) + frequency(d)= 5 + 5 = 10. Next, we push $1 to the min heap.
Step 2:
The new min heap after step 1 is shown below. We extract the two minimum nodes with minimum frequency from the min heap which are band $1. We create a new internal node represented by a dummy character ‘$2’ such that frequency($2) = frequency($1)+frequency(b)= 10 + 40 = 50. Next, we push $2 to the min heap.
Step 3:
The new min heap after step 2 is shown below. We extract the two minimum nodes with minimum frequency from the min heap which are a and $2. We create a new internal node represented by a dummy character ‘$3’ such that frequency($3) = frequency($2) + frequency(a) = 50 + 50 = 100. Next, we push $2 to the min heap.
Step 4:
The new min heap after step 1 is shown below. Since, there is only one node in the min heap i.e the heap size is 1, we terminate the process.
Next, we assign prefix code to the huffman tree. We start from the root node, assign 0 to every left edge and 1 to every right edge as shown in figure below.
The prefix codes of each character are the values of edges traversed to reach the character from the root node. The prefix codes are,
To find the average bits per character we use the following formula,
Total Number of bits used in encoding the characters = frequency of characters x length of prefix codes = 50 x 1 + 40 x 2 + 5 x 3 + 5 x 3 = 160.
Total number of characters = sum of the frequencies of the characters = 50 + 40 + 5 + 5 = 100.
Average bits per character = Total Number of bits used in encoding the characters / Total number of characters = 160 / 100 = 1.6
Algorithm:
Input:
1. An array of unique characters along with their frequency of occurrences
Output:
1. Displays the prefix code of each character
2. Displays the average bits per character
Algo:
Step 1: Start
Step 2: Define user defined datatype Node consisting of char data, intfreq and pointers left & right of type Node
Step 3: Defining user defined datatypeMinHeap consisting of int size & capacity and an array nodes of Node
Step 4: Set inttotalBits 0
Step 5:Defining heapify function
voidheapify(structMinHeap* minHeap, int i)
Set smallest i
Set left (2 * i) + 1
Set right (2 * i) + 2
Set temp Null
Check if (left <minHeap.size AND minHeap.nodes[left].freq< minHeap.nodes[smallest].freq)
Set smallest left
Check if (right <minHeap.size AND minHeap.nodes[right].freq< minHeap.nodes[smallest].freq)
Set smallest right
Check if (smallest ≠ i)
Set temp minHeap.nodes[smallest]
Set minHeap.nodes[smallest] minHeap.nodes[i]
Set minHeap.nodes[i] temp
Call heapify(minHeap, smallest)
Step 6:extractMin function
struct Node* extractMin(structMinHeap* minHeap)
Set temp minHeap.nodes[0]
Set minHeap.nodes[0] minHeap.nodes[minHeap.size – 1]
Set minHeap.sizeminHeap.size – 1
Call heapify(minHeap, 0)
Return temp
Step 7: Defining insertMinHeap function
voidinsertMinHeap(structMinHeap* minHeap, struct Node* parent)
Set minHeap.sizeminHeap.size + 1
Set i minHeap.size – 1
While (parent.freq<minHeap.nodes[(i - 1) / 2].freq AND i > 0) do
Set minHeap.nodes[i] minHeap.nodes[(i - 1) / 2]
Set i (i – 1) / 2
End While
Set minHeap.nodes[i] parent
Step 8:Defining buildMinHeap function
voidbuildMinHeap(structminHeap *minHeap)
Set n minHeap.size – 1
for i = (n - 10) / 2 to i >= 0 repeat
Call heapify(minHeap, i)
Set i i – 1
Step 9: Defining createMinHeap function
structMinHeap* createMinHeap(char chars[], intfreq[], intlen)
Set minHeapGetNode(MinHeap)
Set minHeap.size 0
Set minHeap.capacitylen
Set minHeap.nodesGetNode(minHeap.capacity * Node)
for i = 0 to i <len repeat
Set minHeap.nodes[i] GetNode(Node)
Set minHeap.nodes[i].left Null
Set minHeap.nodes[i].right Null
Set minHeap.nodes[i].freqfreq[i]
Set minHeap.nodes[i].data chars[i]
Set i i + 1
minHeap.sizelen
returnminHeap
Step 10: Defining buildHuffmanTree function
struct Node* buildHuffmanTree(char chars[], intfreq[], intlen)
Set Node *left Null
Set Node *right Null
Set Node *parent Null
Set Node *root Null
Set MinHeap *minHeap call createMinHeap(chars, freq, len)
Call buildMinHeap(minHeap)
While (minHeap.size ≠ 1) do
Set left extractMin(minHeap)
Set right extractMin(minHeap)
Set parent GetNode(Node)
Set parent.left left
Set parent.right right
Set parent.data ‘$’
Set parent.freqleft.freq + right.freq
Call insertMinHeap(minHeap, parent)
End While
Set root extractMin(minHeap)
Return root
Step 11:Defining displayPrefixCode function
voiddisplayPrefixCode(struct Node* root, int prefix[], int top)
Check if (root.left ≠ Null)
Set prefix[top] 0
Call displayPrefixCode(root.left, prefix, top + 1)
Check if (root.right ≠ Null)
Set prefix[top] 1
Call displayPrefixCode(root.right, prefix, top + 1)
Check if ((root.left = Null) AND (root.right = Null))
Display “Prefix code of ”root.data “ = ”
for i = 0 to i < top repeat
Display prefix[i]
Set i = i + 1
Set totalBitstotalBits + (root.freq * i)
Step 12: Defining main function
Display “Enter Number Of Characters”
Read len
Set char chars[len] Null
Set intfreq[len] Null
Set intprefix[height] Null
Set int top 0
Set Node *root Null
for i = 0 to i <len repeat
Display “Character ” (i + 1)
Read chard[i]
Display “Frequency ” (i+1)
Read freq[i]
Set totalFreqtotalFreq + freq[i]
Set i i + 1
Set root Call buildHuffmanTree(chars, freq, len)
Call displayPrefixCode(root, prefix, top)
Set avgtotalBits / totalFreq
Display “Average Bits Per Character =” avg
Step 13: Stop
Characteristics of Huffman Coding:
● Huffman coding algorithm requires a priority queue of a heap tree to store and extract the characters with least frequency
● It requires an extra space for storing the characters and their frequencies in an array to form the heap. The space complexity is O(n) where n is the number of characters.
● Huffman coding uses a form of optimal prefix codes to encode data.
Huffman coding Applications:
● Arithmetic coding
● Lossless data compression (for example, DEFLATE)
Huffman Coding Advantage:
● Huffman coding results in lossless data compression.
● Huffman coding assigns shorter prefix code to words with higher frequencies which reduces storage space.
Huffmand Coding Disadvantage:
● Huffman coding algorithm needs to know in advance the frequency of each character.
Complexity of Huffman Coding:
Time Complexity of Huffman Coding:
Let n be the number of character. In the buildHuffmanTree() function, the buildMinHeap() function is called one time which takes a time complexity of O(n). The extractMin() function takes a time complexity of O(log n) . Inside the buildHuffmanTree() function, the extractMin() function is called 2*(n-1) time which takes a time complexity of 2*(n-1)*O(log n) = O(n log n). Hence, the total time complexity of the algorithm is O(n) + O(nlogn) = O(nlogn).
Space Complexity of Huffman Coding:
The input characters are stored in heap tree of size n where n is the number of characters. Hence, the space complexity of the algorithm is O(n).