Jesse Vahlfors

Jesse Vahlfors

Jv-compression-tool

This project explores how lossless compression works behind the scenes by implementing a complete Huffman compressor and decompressor in Python, with a clear, modular design and full test coverage.

Jv-compression-tool

Project Overview

For recruiters: This project demonstrates my ability to design clean, modular architectures and implement complex algorithms without relying on built-in shortcuts. jv-compression-tool is a fully tested Huffman compression library built entirely from scratch in Python, developed with a focus on clarity, reliability, and test-driven development.

Key skills demonstrated include:

  • Algorithm implementation: custom min-heap, Huffman tree generation, bit-level encoding/decoding.
  • Software architecture: clear module boundaries, reusable utilities, and maintainable project layout.
  • Test-driven development: comprehensive unit and integration test suite covering all edge cases.
  • Binary data handling: packing and unpacking bitstreams, building a reliable header format.
  • Package design: structured as a distributable Python library with documentation and versioning.

Together, these choices show my ability to work independently through complex problems, write clean and well-tested code, and build practical developer tools from the ground up.

───────────────────────────────────────

jv-compression-tool is a pure Python implementation of Huffman compression that I built from the ground up as a learning project. My goal was to understand how data compression works at a low level — not just by using existing libraries, but by manually implementing every part of the pipeline, from frequency counting to bit-level encoding.

The current version of the library is primarily designed and tested for text-based data (UTF-8 bytes). While the underlying Huffman implementation can technically compress any byte sequence, the high-level API is currently optimized around text input. Future updates may expand its focus to handle arbitrary binary formats more robustly.

The tool includes a complete Huffman workflow: building frequency tables, constructing a min-heap, generating the Huffman tree, creating the code map, and finally packing the encoded bitstring into bytes. It also features a custom header format that stores metadata like padding length and symbol frequencies, allowing the compressed data to be decoded back into the original content.

Everything is written with clarity in mind, using a modular structure that separates components like tree construction, bit utilities, header encoding/decoding, and the high-level compressor and decompressor. The entire project is developed with test-driven development, ensuring reliability through a full suite of unit and integration tests.

Key Challenges I Solved:

  • Designing a clean, modular architecture — separating frequency analysis, tree building, code mapping, bit utilities, and file I/O into their own modules while keeping the full pipeline easy to follow.
  • Implementing a min-heap from scratch — including sift-up, sift-down, push/pop operations, and validation tests, instead of relying on Python’s built-in heapq.
  • Building a fully functional Huffman tree — handling all edge cases like empty input, single-symbol files, and deterministic merging of equal-weight nodes.
  • Bit-level manipulation — packing variable-length bitstrings into bytes, tracking padding, and converting bytes back to exact bitstreams during decompression.
  • Creating a custom header format — encoding padding length and the frequency table in a human-readable structure, then decoding it reliably on the other side.
  • Ensuring correctness with TDD — writing a comprehensive suite of unit and integration tests for every module in the pipeline, including stress-testing the compressor and decompressor.

Building this project helped me deepen my understanding of algorithms, binary data handling, recursion, and Python’s performance considerations. It was also a great exercise in designing clean, maintainable modules and writing code that is easy to reason about and extend.

I will also be creating a live demo page for the library, where you can experiment with compressing and decompressing text directly in the browser. This will make it easy to explore the tool’s behavior, understand Huffman coding visually, and see how the compression pipeline works step by step.

Technologies Used

  • Python