Data compression
Handout
This page needs a recent browser (with SharedArrayBuffer support). Please update Chrome, Edge, Firefox or Safari to the latest version.
Why compress data
- Data takes up space to store and time to send.
- Compression makes a file smaller so it is cheaper to save and faster to share.
- There are two kinds: lossless and lossy.
Lossless compression
- Lossless makes a file smaller but keeps every bit of the data.
- When you open the file, you get back the exact original.
- It works by finding patterns and writing them in a shorter way.
Original: AAAAAAAA-BBB
Shorter: 8A-3B (8 A's, then 3 B's)
Open it: AAAAAAAA-BBB (exactly the same again)
Lossy compression
- Lossy makes a file much smaller by throwing away some data.
- It drops details that people can barely see or hear.
- You cannot get the exact original back — but it is "close enough".
Photo (large) --lossy--> Photo (small)
A few colors and fine details are gone,
but your eye hardly notices.
The trade-off: size vs quality
- Lossless keeps full quality, but the file stays larger.
- Lossy gives a much smaller file, but quality goes down a little.
- You choose based on what matters more: perfect data or small size.
When to use each
- Use lossless when every detail must be exact.
- Use lossy when a small drop in quality is fine and small size matters.
Lossless: text, code, a .zip file, a spreadsheet
Lossy: photos (JPEG), music (MP3), video
Key idea
- Compression trades size against quality (or against work to undo it).
- Lossless = smaller and perfect; lossy = much smaller but not exact.
- Good engineers pick the right kind for the job.
Run-length encoding
- Run-length encoding (RLE) is a simple lossless method.
- A run is a stretch of the same character repeated. RLE stores a count instead of the repeats.
- We will store each run as a pair
[character, count]inside a list.
"AAAB" -> [["A", 3], ["B", 1]] (3 A's, then 1 B)
[["A", 3], ["B", 1]] -> "AAAB" (decode it back — exact again)
Now you try
- Build RLE yourself: an encoder, a decoder, and a length helper.
- Each task checks your function on several inputs. Press Check answer.
Write encode(text) for run-length encoding. Return a list of [character, count] pairs, one per run of repeats. Example: encode("AAAB") → [['A', 3], ['B', 1]]. For the empty string return [].
Click Run to see the output here.
Write decode(pairs) that reverses the encoder: given a list of [character, count] pairs, rebuild the original string. Example: decode([['A', 3], ['B', 1]]) → 'AAAB'. For [] return ''.
Click Run to see the output here.
Without decoding, write original_length(pairs) that returns how many characters the original text had — just add up the counts. Example: original_length([['A', 3], ['B', 1]]) → 4.
Click Run to see the output here.