JV Compression Tool

Try compressing text using my Huffman compression library.

Built from scratch in Python using a custom Huffman tree, heap, and header format.

Compress

Max ~250 KB

Results

Compressed output will appear here.

Decompress

Paste base64

Output

Decompressed text will appear here.

Why the samples behave like they do

Tiny input → often gets bigger

Huffman compression always includes a header with padding information and the data needed to rebuild the tree during decompression. For very small inputs, this fixed overhead dominates, so the compressed output can be larger than the original.

Takeaway: Compression has a startup cost.

Highly repetitive → big wins

When one or a few symbols dominate the input, Huffman assigns them very short codes. This lowers the average bits per symbol and results in strong compression.

Takeaway: Skewed frequency distributions work best.

Uniform / random → little or no win

If symbols appear with roughly equal frequency, Huffman codes end up close to fixed-width length. In that case there is little redundancy to remove, and the header overhead can make the result worse.

Takeaway: High entropy resists compression.

Large & skewed → best case

With enough input data, the header cost becomes negligible. Strong frequency imbalance then drives excellent compression ratios.

Takeaway: Size helps, but structure matters more.

How this implementation works

These notes describe how this implementation of Huffman coding works in practice.

Frequency counting

The compressor starts by counting how often each byte value appears in the input. These frequencies fully determine the structure of the Huffman tree and the final codes.

Common symbols receive shorter codes, while rare symbols receive longer ones.

Nodes & tree construction

Each symbol begins as a leaf node containing its frequency. Nodes are merged using a min-heap, repeatedly combining the two lowest-weight nodes into a new internal node.

This continues until a single Huffman tree remains, defining a prefix-free code for all symbols.

For a visual explanation, see the interactive Huffman tree example on W3Schools.

Header structure & overhead

The compressed output includes a header containing padding length information and the data required to reconstruct the Huffman tree during decompression. This makes decompression deterministic and self-contained, but it also introduces fixed overhead — especially noticeable for small inputs.

Go Back