Compression
Making files smaller
- Compression reduces a file's size — saving storage and transmission bandwidth.
- There are two families: lossless and lossy.
- Choosing the right one depends on whether you can afford to lose any detail.
Lossless vs lossy
- Lossless — the original data is recovered exactly (ZIP, PNG; text, programs).
- Lossy — some detail is thrown away for much smaller files (JPEG, MP3, video).
- Use lossless for documents, source code, medical images — anything needing exact data.
- Use lossy for streaming media. Real-time video must send huge data over limited bandwidth, so it must use lossy — lossless would not shrink it enough and the picture would freeze.
Lossless compression means:
Lossless compression lets you rebuild the original data exactly — essential for text, programs and ZIP/PNG.
Which file should be compressed losslessly?
Source code must be recovered exactly — a single changed character could break it — so it needs lossless compression.
Why does real-time video streaming use lossy compression?
Raw HD video is gigabytes per minute; only lossy compression shrinks it enough to stream in real time without freezing.
Lossless methods
- Run-length encoding (RLE) — store "the next $n$ values are $x$" instead of repeating $x$. Great for flat areas; useless for noisy data.
- Dictionary coding (ZIP, PNG) — replace repeated byte sequences with a short reference. Good for text and code.
- Huffman coding — give short codes to common symbols and long codes to rare ones, lowering the average length.
Run-length encoding works best on data that has:
RLE replaces a run of identical values with a count + value, so it shines on flat areas and is useless on noisy data.
Huffman coding reduces size by:
Huffman assigns the shortest codes to the most frequent symbols, lowering the average code length.
Lossy methods
- Images (JPEG) — drop fine detail and colour differences the eye barely notices.
- Sound (MP3, AAC) — drop pitches we hear poorly, and quiet sounds masked by louder ones.
- Video — combines spatial compression (within each frame, like JPEG) with temporal compression (most frames store only the differences from the previous frame).
Temporal compression of video works by:
Temporal compression stores how each frame differs from the one before, since most of the picture stays the same between frames. (Spatial compression handles within-frame detail.)
You've got it
- lossless = exact recovery (ZIP/PNG, text, code); lossy = detail dropped (JPEG/MP3/video)
- lossless methods: RLE (runs), dictionary (repeated sequences), Huffman (common = short code)
- real-time video streaming uses lossy — limited bandwidth, huge data
- video = spatial (per frame) + temporal (differences between frames) compression