Compression

Vocabulary

English	Chinese	Pinyin
compression	压缩	yā suō
bandwidth	带宽	dài kuān
lossless	无损	wú sǔn
lossy	有损	yǒu sǔn
run-length encoding	行程编码	xíng chéng biān mǎ
dictionary coding	字典编码	zì diǎn biān mǎ
Huffman coding	霍夫曼编码	huò fū màn biān mǎ

Making files smaller

Compression 压缩 reduces a file's size — saving storage and transmission bandwidth 带宽.
There are two families: lossless 无损 and lossy 有损.
Choosing the right one depends on whether you can afford to lose any detail.

Lossless vs lossy

Lossless — the original data is recovered exactly (ZIP, PNG; text, programs).
Lossy — some detail is thrown away for much smaller files (JPEG, MP3, video).
Use lossless for documents, source code, medical images — anything needing exact data.
Use lossy for streaming media. Real-time video must send huge data over limited bandwidth, so it must use lossy — lossless would not shrink it enough and the picture would freeze.

Compression methods: lossless vs lossy, with common examples

Practice

Lossless compression means:

the exact original data can be recovered
some detail is permanently lost
the file gets bigger
the data is encrypted

Practice

Which file should be compressed losslessly?

a program's source code
a music track for streaming
a video for a website
a photo gallery thumbnail

Practice

Why does real-time video streaming use lossy compression?

it must send huge amounts of data over limited bandwidth, and lossless would not shrink it enough
lossy keeps every pixel perfect
lossless is illegal for video
video has no detail to lose

Practice

Match each compression idea to what it means.

rebuilds the original data exactly

discards some data for a smaller size

replaces a run with a value plus a count

text, programs, ZIP and PNG

Lossless
Lossy
Run-length encoding
Use lossless for

Practice

Lossless compression rebuilds the original data exactly (needed for text and programs), while lossy compression permanently removes some data to shrink the file (used for photos and audio).

Lossless methods

Run-length encoding 行程编码 (RLE) — store "the next $n$ values are $x$" instead of repeating $x$. Great for flat areas; useless for noisy data.
Dictionary coding 字典编码 (ZIP, PNG) — replace repeated byte sequences with a short reference. Good for text and code.
Huffman coding 霍夫曼编码 — give short codes to common symbols and long codes to rare ones, lowering the average length.

Explore

Run-length encoding (a lossless method)

Run-length encoding replaces a run of repeated values with one value plus a count. It is lossless — the original rebuilds exactly — but only shrinks data that has long runs.

Practice

Run-length encoding works best on data that has:

long runs of repeated values (e.g. flat areas of colour)
completely random noise
no repetition at all
encrypted content

Practice

Huffman coding reduces size by:

giving short codes to common symbols and long codes to rare ones
deleting every second byte
rounding numbers
storing only the differences between frames

Lossy methods

Images (JPEG) — drop fine detail and colour differences the eye barely notices.
Sound (MP3, AAC) — drop pitches we hear poorly, and quiet sounds masked by louder ones.
Video — combines spatial compression (within each frame, like JPEG) with temporal compression (most frames store only the differences from the previous frame).
Compression applies to text, bitmap image and vector graphic files alike.

Practice

Temporal compression of video works by:

storing only the differences from the previous frame
compressing each frame like a JPEG
removing the soundtrack
lowering the screen resolution

Repeatedly merging the two lowest-frequency nodes builds a tree; reading 0 left and 1 right down to each letter gives the commonest letter the shortest code.

You've got it

Key idea

lossless = exact recovery (ZIP/PNG, text, code); lossy = detail dropped (JPEG/MP3/video)
lossless methods: RLE (runs), dictionary (repeated sequences), Huffman (common = short code)
real-time video streaming uses lossy — limited bandwidth, huge data
video = spatial (per frame) + temporal (differences between frames) compression

Making files smaller

Lossless vs lossy

Lossless methods

Run-length encoding (a lossless method)

Lossy methods

You've got it

Handout

Log in or create account

Feedback & help