Shivam Verma

LZW

LZW (Lempel–Ziv–Welch) is a universal lossless data compression technique. This compression algorithm was developed by Abraham Lempel, Jakob Ziv, and Terry Welch. In hardware implementations, the algorithm is simple and has the potential for very high throughput. It is the algorithm used in the GIF image format and is part of the widely used Unix file compression utility compress.

Why Do We Need a Compression Algorithm?

Although modern computers can store an increasing number of files, file size is still an issue. We can store more files if our files are smaller. We use various compression algorithms to reduce the amount of space required to represent a file. Lossless and lossy compression algorithms are the two types of compression algorithms. Lossless compression algorithms reduce file size without hampering any information in the file. In contrast, lossy compression algorithms reduce the file size by removing unnecessary or less important information in the file. There are various reasons why we need data compression algorithms, some of which are mentioned below.

  • Data compression reduces the amount of space that files take up on a hard drive and the amount of time taken to transfer or download them.
  • To reduce storage costs, we need compression algorithms that allow us to store large amounts of data at a lower cost.
  • For example, in a 2:1 compression ratio, a 20 MB file takes up 10 MB of space, allowing us to store large files in less storage space.

Some lossy compression algorithms are:

  • Vector Quantisation
  • DCT (Discrete Cosine Transform)
  • Transform Coding

Some lossless compression algorithms are:

  • LZW (Lempel Ziff Welch)
  • RLE (Run Length Encoding)
  • String-table compression

What is Lempel–Ziv–Welch (LZW) Algorithm?

Lempel–Ziv–Welch (LZW) Algorithm is a common lossless data compression algorithm. As it is a lossless compression algorithm, there is no data loss during compression. LZW (Lempel–Ziv–Welch) is named after the scientists who developed it, Abraham Lempel, Jakob Ziv, and Terry Welch. LZW is a ‘dictionary-based’ lossless compression algorithm that scans a file for data patterns that appear more than once. These patterns are then saved in a dictionary, and references are placed within the compressed file wherever repetitive data occurs.
In hardware implementations, the algorithm is simple and has the potential for very high throughput. It is the algorithm used in the GIF image format and is part of the widely used Unix file compression utility compress.

How Does It Work?

The LZW Algorithm is divided into two parts: the encoding algorithm, which converts strings to integer codes, and the decoding algorithm, which does the opposite. Both encoder and decoder algorithms have a default table or dataset that serves as the initial model for both encoder and decoder. As the algorithm runs, new integer codes for various string patterns are added to this table.
The number of table entries in the code table is usually set to 4096. When encoding begins, the code table only contains the first 256 entries, with the remaining entries being blanks. LZW detects repeated sequences in the data and adds them to the code table as the encoding progresses. Each code from the compressed file is decoded through the code table to determine which character or characters it represents.

Implementation

LZW Compression
When the input data is processed, the compression algorithm keeps a dictionary corresponding to the longest words encountered with a list of code values. Because the words are swapped out for their matching codes, the input file is compressed. As a result, the algorithm becomes more effective as the volume of long, repeated words in the input stream increases.

Pseudo-code for LZW encoding

Initialize the string table with a single character.
    str1 = first input symbols
    WHILE there are still input symbols
        str2 = next input symbols
        IF str1 + str2 is in the table
            str1 = str1 + str2
        ELSE
            output code for str1
        add str1 + str2 to the table
            str1 = str2
    END WHILE
output the code for str1 

Example 1: Compress the string ABABBABBB using the LZW technique.
The steps are depicted in the diagram below in a proper sequence.

compress-string-using-lzw-technique-steps
compress-string-using-lzw-technique-steps2
compress-string-using-lzw-technique-steps3

LZW Decompression
During decompression, the LZW (Lempel–Ziv–Welch) decompressor generates the same string table. Single characters are initialized for the first 256 string table entries. Except for the first character in the input string, the string table is updated. Decoding is accomplished by reading the codes and converting them using the code table that is being created.
The following C++ code is provided for LZW compression encoding and decoding:

Pseudo-code for LZW decoding

Initialize the string table with a single character.
OLDCODE = first encoded symbol
output translation of OLDCODE
WHILE till the end of the input stream
    NEWCODE = next encoded symbol
    IF NEWCODE is not in the table
        str1 = translation of OLDCODE
        str1 = str1 + str2
    ELSE
        str1 = translation of NEWCODE
    output str1
    str2 = first character of str1
    OLDCODE + str2 to the table
    OLDCODE = NEWCODE
END WHILE

Example 2: Decompress the output sequence of: 65>66>256>257>66>260> using LZW.
The steps involved are systematically shown in the diagram below.

steps-to-decompress-the-output-sequence

The following is a C++ code for LZW compression, both encoding, and decoding.

#include <bits/stdc++.h>
using namespace std;
void compression(string s,vector<int> &output_code)
{
    cout << "Compression" << endl;
    map<string, int> table;
    for (int i = 0; i < 256; i++) {
        string str = "";
        char ch = i;
        str+= ch;
        table[str] = i;
    }

    string str1 = "", str2 = "";
    str1 += s[0];
    int code = 256;
    cout << "String\tOutput_Code\t Extended_Code\n";
    for (int i = 0; i < s.size(); i++) 
    {
        if (i != s.size() - 1)
            str2 += s[i + 1];
        if (table.find(str1 + str2) != table.end()) 
        {
            str1 = str1 + str2;
        }
        else {
            cout << str1 << "\t\t\t" << table[str1] << "\t\t\t"
                 << str1 + str2 << ": " << code << endl;
            output_code.push_back(table[str1]);
            table[str1 + str2] = code;
            code++;
            str1 = str2;
        }
        str2 = "";
    }
    cout << str1 << "\t\t\t" << table[str1] << endl;
    output_code.push_back(table[str1]);
    cout << "Output Codes: ";
    int n=output_code.size();
    for (int i = 0; i < n; i++) 
    {
        cout << output_code[i] << " ";
    }
    cout<<endl;
}

void decompression(vector<int> output_code)
{
    cout << endl << "Decompression"<<endl;
    map<int, string> table;
    for (int i = 0; i < 256; i++) {
        string str = "";
        char ch=i;
        str += ch;
        table[i] = str;
    }
    int old_code = output_code[0], n;
    string str1 = table[old_code];
    string str2 = "";
    str2 += str1[0];
    cout << str1;
    int count = 256;
    for (int i = 0; i < output_code.size() - 1; i++) {
        n = output_code[i + 1];
        if (table.find(n) == table.end()) 
        {
            str1 = table[old_code];
            str1 = str1 + str2;
        }
        else 
        {
            str1 = table[n];
        }
        cout << str1;
        str2 = "";
        str2 += str1[0];
        table[count] = table[old_code] + str2;
        count++;
        old_code = n;
    }
}
int main()
{
    string str= "ABD-ABKABD-ABDABDK";
    vector<int> output_code;
    compression(str,output_code);
    decompression(output_code);
}

Output

Compression
String    Output_Code  Extended_Code
A            65          AB: 256
B            66          BD: 257
D            68          D-: 258
-            45          -A: 259
AB            256         ABK: 260
K            75          KA: 261
AB            256         ABD: 262
D-            258         D-A: 263
ABD            262         ABDA: 264
ABD            262         ABDK: 265
K            75
Output Codes: 65 66 68 45 256 75 256 258 262 262 75 

Decompression
ABD-ABKABD-ABDABDK

Advantages of LZW over Huffman

The LZW algorithm has the following advantages:

  • The algorithm is straightforward, simple, and effective.
  • LZW does not require any prior knowledge of the input data stream.
  • The LZW algorithm has a 60 – 70% compression ratio for some text files.
  • The LZW algorithm performs better for files with a lot of repetitive data.
  • The LZW algorithm compresses the data in a single pass.

Conclusion

  • Lempel–Ziv–Welch (LZW) Algorithm is a common lossless data compression algorithm.
  • Abraham Lempel, Jakob Ziv, and Terry Welch developed the LZW compression algorithm.
  • Lossless compression algorithms reduce file size without hampering any information in the file.
  • Lempel–Ziv–Welch (LZW) algorithm is used in the GIF image format and is part of the widely used Unix file compression utility compress.
  • LZW is a ‘dictionary-based’ lossless compression algorithm that scans a file for data patterns that appear more than once.
  • The LZW algorithm performs better for files with a lot of repetitive data.
  • The LZW Algorithm is divided into two parts: the encoding algorithm, which converts strings to integer codes, and the decoding algorithm, which does the opposite.
  • We should not prefer the LZW algorithm for compression when there is no repetitive data.

Author