Run-length encoding (compression)
Handout
This page needs a recent browser (with SharedArrayBuffer support). Please update Chrome, Edge, Firefox or Safari to the latest version.
Squeezing repeated data
- Compression makes data smaller. Run-length encoding (RLE) is one simple way.
- It works well when the same value repeats many times in a row.
- It is lossless: from the squeezed form you can rebuild the exact original.
What a "run" is
- A run is a stretch of the same character repeated: in
"aaabbc","aaa"is a run of 3. - RLE replaces each run with the character followed by how many times it repeats.
- So
"aaabbc"becomes"a3b2c1"— much shorter when runs are long.
Counting a run
- To measure a run, look at a character, then count how many of the same character follow it.
- Stop when the next character is different, or you reach the end (
'\0'). - That count is the run's length.
Building the encoded string
- Walk the input. For each run, write the character, then its count, into the output.
- Move your input position past the whole run before starting the next one.
- End the output string with
'\0'so it is a proper C string.
#include <stdio.h>
int main(void) {
const char *s = "aaab";
int i = 0;
char c = s[i];
int count = 0;
while (s[i] == c) { // count the first run
count++;
i++;
}
printf("%c%d\n", c, count); // a3
return 0;
}
Now you try
- Find each run, then write the character and its count to the output.
- The caller gives you an output buffer big enough to hold the result. Do not write a
main.
Complete int run_length_at(const char *s, int i) so it returns how many times the character s[i] repeats starting at index i. Do not write a main.
Click Run to see the output here.
Complete void rle_encode(const char *in, char *out) so it writes the run-length encoding of in into out: each run becomes the character then its count. "aaabbc" becomes "a3b2c1". End out with '\0'. Do not write a main.
Click Run to see the output here.