public class LZ77Compressor
extends java.lang.Object
Most LZ77 derived algorithms split input data into blocks of
uncompressed data (called literal blocks) and back-references
(pairs of offsets and lengths) that state "add length
bytes that are the same as those already written starting
offset
bytes before the current position. The details
of how those blocks and back-references are encoded are quite
different between the algorithms and some algorithms perform
additional steps (Huffman encoding in the case of DEFLATE for
example).
This class attempts to extract the core logic - finding back-references - so it can be re-used. It follows the algorithm explained in section 4 of RFC 1951 (DEFLATE) and currently doesn't implement the "lazy match" optimization. The three-byte hash function used in this class is the same as the one used by zlib and InfoZIP's ZIP implementation of DEFLATE. The whole class is strongly inspired by InfoZIP's implementation.
LZ77 is used vaguely here (as well as many other places that talk about it :-), LZSS would likely be closer to the truth but LZ77 has become the synonym for a whole family of algorithms.
The API consists of a compressor that is fed byte
s
and emits LZ77Compressor.Block
s to a registered callback where the blocks
represent either literal blocks
, back-references
or end of data
markers
. In order to ensure the callback receives all information,
the #finish
method must be used once all data has been fed
into the compressor.
Several parameters influence the outcome of the "compression":
windowSize
windowSize
- real world values are
in the area of 32k.minBackReferenceLength
maxBackReferenceLength
maxOffset
maxLiteralLength
Modifier and Type | Class and Description |
---|---|
static class |
LZ77Compressor.BackReference
Represents a back-reference.
|
static class |
LZ77Compressor.Block
Base class representing blocks the compressor may emit.
|
static interface |
LZ77Compressor.Callback
Callback invoked while the compressor processes data.
|
static class |
LZ77Compressor.EOD
A simple "we are done" marker.
|
static class |
LZ77Compressor.LiteralBlock
Represents a literal block of data.
|
Modifier and Type | Field and Description |
---|---|
private int |
blockStart |
private LZ77Compressor.Callback |
callback |
private int |
currentPosition |
private static int |
H_SHIFT |
private static int |
HASH_MASK |
private static int |
HASH_SIZE |
private int[] |
head |
private boolean |
initialized |
private int |
insertHash |
private int |
lookahead |
private int |
matchStart |
private int |
missedInserts |
private static int |
NO_MATCH |
(package private) static int |
NUMBER_OF_BYTES_IN_HASH |
private Parameters |
params |
private int[] |
prev |
private static LZ77Compressor.Block |
THE_EOD |
private byte[] |
window |
private int |
wMask |
Constructor and Description |
---|
LZ77Compressor(Parameters params,
LZ77Compressor.Callback callback)
Initializes a compressor with parameters and a callback.
|
Modifier and Type | Method and Description |
---|---|
private void |
catchUpMissedInserts() |
private void |
compress() |
void |
compress(byte[] data)
Feeds bytes into the compressor which in turn may emit zero or
more blocks to the callback during the execution of this
method.
|
void |
compress(byte[] data,
int off,
int len)
Feeds bytes into the compressor which in turn may emit zero or
more blocks to the callback during the execution of this
method.
|
private void |
doCompress(byte[] data,
int off,
int len) |
void |
finish()
Tells the compressor to process all remaining data and signal
end of data to the callback.
|
private void |
flushBackReference(int matchLength) |
private void |
flushLiteralBlock() |
private void |
initialize() |
private int |
insertString(int pos)
Inserts the current three byte sequence into the dictionary and
returns the previous head of the hash-chain.
|
private void |
insertStringsInMatch(int matchLength) |
private int |
longestMatch(int matchHead)
Searches the hash chain for real matches and returns the length
of the longest match (0 if none were found) that isn't too far
away (WRT maxOffset).
|
private int |
longestMatchForNextPosition(int prevMatchLength) |
private int |
nextHash(int oldHash,
byte nextByte)
Assumes we are calculating the hash for three consecutive bytes
as a rolling hash, i.e.
|
void |
prefill(byte[] data)
Adds some initial data to fill the window with.
|
private void |
slide() |
private static final LZ77Compressor.Block THE_EOD
static final int NUMBER_OF_BYTES_IN_HASH
private static final int NO_MATCH
private final Parameters params
private final LZ77Compressor.Callback callback
private final byte[] window
private final int[] head
private final int[] prev
private final int wMask
private boolean initialized
private int currentPosition
private int lookahead
private int insertHash
private int blockStart
private int matchStart
private int missedInserts
private static final int HASH_SIZE
private static final int HASH_MASK
private static final int H_SHIFT
public LZ77Compressor(Parameters params, LZ77Compressor.Callback callback)
params
- the parameterscallback
- the callbackjava.lang.NullPointerException
- if either parameter is null
public void compress(byte[] data) throws java.io.IOException
data
- the data to compress - must not be nulljava.io.IOException
- if the callback throws an exceptionpublic void compress(byte[] data, int off, int len) throws java.io.IOException
data
- the data to compress - must not be nulloff
- the start offset of the datalen
- the number of bytes to compressjava.io.IOException
- if the callback throws an exceptionpublic void finish() throws java.io.IOException
The compressor will in turn emit at least one block (LZ77Compressor.EOD
) but potentially multiple blocks to the callback during
the execution of this method.
java.io.IOException
- if the callback throws an exceptionpublic void prefill(byte[] data)
This is used if the stream has been cut into blocks and back-references of one block may refer to data of the previous block(s). One such example is the LZ4 frame format using block dependency.
data
- the data to fill the window with.java.lang.IllegalStateException
- if the compressor has already started to accept dataprivate int nextHash(int oldHash, byte nextByte)
The hash is shifted by five bits on each update so all effects of A have been swapped after the third update.
private void doCompress(byte[] data, int off, int len) throws java.io.IOException
java.io.IOException
private void slide() throws java.io.IOException
java.io.IOException
private void initialize()
private void compress() throws java.io.IOException
java.io.IOException
private int insertString(int pos)
Updates insertHash
and prev
as a
side effect.
private int longestMatchForNextPosition(int prevMatchLength)
private void insertStringsInMatch(int matchLength)
private void catchUpMissedInserts()
private void flushBackReference(int matchLength) throws java.io.IOException
java.io.IOException
private void flushLiteralBlock() throws java.io.IOException
java.io.IOException
private int longestMatch(int matchHead)
Sets matchStart to the index of the start position of the longest match as a side effect.