Languages Assembler x86 Programming Sourcecode Compression  

Limpel-Ziv file compress/decompress w/ASM src

download download  
important code information
author:
Unknown
minimum requirements:
Limpel-Ziv file compress/decompress w/ASM src description

These programs are a file compressor and decompressor based on the Limpel-Ziv algorithm.


These programs are a file compressor and decompressor based on the
Limpel-Ziv algorithm.

The compression algorithm builds a string translation table that maps
substrings from the input stream into fixed-length codes. These codes
are used by the decompression algorithm to rebuild the compressor's table
and reconstruct the original data stream. In it's simplest form, the
algorithm can be stated as:
"If <w>k is in the table, then <w> is in the table."

The following comments were paraphrased from the VMS LZCOMP program
sources. The algorithm is from the article "A Technique for High
Performance Data Compression" by Terry A. Welch which appeared in IEEE
Computer Volume 17, Number 6 (June 1984), pp 8-19.

Compress:
1) Initialize table to single character strings.
2) Read first character. Set <w> (the prefix string) to that
character.
3) Read next character, k.
4) If at end of file, output code <w> and exit.
5) If <w>k is in the table, set <w> to <w>k; goto step 3.
6) Output code <w>.
7) Put <w>k into table.
8) Set <w> to k. Goto step 3.

<w> is a code which represents a string in the table. When a new
character k is read in, the table is searched for <w>k. If this
combination is found, <w> is set to the code for that combination and the
next character is read in. Otherwise, this combination is added to the
table, the code <w> is written to the output stream and <w> is set to k.

The decompression algorithm builds an identical table by parsing each
received code into a prefix string and extension character. The
extension character is pushed onto the stack and the prefix string
translated again until it is a single character. This completes the
decompression. The decompressed code is then output by popping the stack
and a new entry is made in the table.

The algorithm breaks when the input stream contains the sequence
k<w>k<w>k, where k<w> is already in the compressor's table. This
sequence is handled in step 3 below.

Decompress:
1) Read first input code, assign to <w>, k, oldcode and
finchar. Output k.
2) Read next code <w>, assign to incode. If end of file, exit.
3) If <w> not in table (k<w>k<w>k case),
a) Push finchar onto stack.
b) Set <w> and incode to oldcode.
4) If <w> in table,
a) Push <w>.char onto stack.
b) Set <w> to <w>.code.
c) Goto step 4.
5) Set oldcode and finchar to <w>, output finchar.
6) While characters on stack pop stack and output character.
7) Add <oldcode>k to table.
8) Set oldcode to incode.
9) Goto step 2.

The algorithm used here has one additional feature. The output codes are
variable length. They start at nine bits. Once the number of codes
exceeds the current code size, the number of bits in the code is
increased. When the table is completely full, a clear code is
transmitted for the decompressor and the table is reinitialized. This
program uses a maximum code size of 12 bits for a total of 4096 codes.

The decompressor realizes that the code size is changing when it's table
size reaches the maximum for the current code size. At this point, the
code size is increased. Remember that the decompressor's table is
identical to the compressor's table at any point in the original data
stream.

My sources of reference while implementing these programs were the
following:

Bernie Eiben, DEC
Unix (tm) COMPRESS program sources
VMS LZCOMP/LZDCMP program sources (Martin Minow, DEC)
ARC file archiver sources (Thom Henderson, System Enhancement
Associates)

Toad Hall Tweaks:
LZCOMP2:
- Still prompts for input,output file names.
- Rewritten as a .COM file (tighter, faster).
- Output is identical to LZCOMP.ASM
LZCOMP3:
- Takes input filename from command line (else uses StdIn).
- Outputs per DOS redirection (e.g., LZCOMP3 BOGUS.TXT >BOGUS.LZW)
- Won't work with "<BOGUS.TXT" input redirection or as a filter
(sigh...)
- Tighter and faster yet.

LZDCMP2,
LZDCMP3:
- Same tweaks as equivalent LZCOMP files.

Kudos to the original author:
Tom Pfau
Digital Equipment Corporation
Parsippany, NJ

... nice hack! Sure beats the horrible C stuff I've only been able
to find to date for any sort of file compression!

I'm now contemplating some comments I read in a recent ARC utility
(GSARC) where the author expanded the number of code bits from 9..2
to 3..13 (as I recall); and he doesn't discard the table on overflow.
Does some tricky other stuff (not clear yet).

David Kirschbaum
Toad Hall
kirsch@braggvax.ARPA



File List:
FILE_ID.DIZ48b
LZ.TXT5Kb
LZCOMP1.ASM7Kb
LZCOMP1.EXE4Kb
LZCOMP2.ASM8Kb
LZCOMP3.ASM8Kb
LZDCMP1.ASM6Kb
LZDCMP1.EXE4Kb
LZDCMP2.ASM6Kb
LZDCMP3.ASM7Kb
MACROS1.MLB7Kb
MACROS2.MLB7Kb

Similar code
Data compressing algorithms LZARI/LZSS/LZHUF - H.Okumura (Popularity: ) : Incluces both a tutorial and sourcecode.
CPUID for intel (Popularity: ) : Source to detect which intel chip is in a computer - Detects the processor for sure, but does it by checking for invalid opcodes.
386ID -- Return The 386/486 Component Identifier (Popularity: ) : assembler source.
User reviews

Write a review:
1 2 3 4 5 6 7 8 9 10
1=poor 10=excellent
Write review*
Your name*
Email*
  (Comments are moderated, and will not appear on this site until the editor has approved them)
 
Rate me
supported os
stats
downloads 400
version
size in Kb 24
popularity   3839/7913374
user rating 5/10
ad


New Code
Popular Code