If you want to make money online : Register now

# Compression of a text that contains only the 5 first letters of the alphabet?

, ,
Problem Detail:

So the question is: "Imagine the compression of a text written only with the first 5 characters of the alphabet (a, b, c, d, e). Knowing that their apparition's frequency is 1)e, 2)a, 3)c, 4)d, 5)b, code the letters on the minimum bits possible."

so here's how I did start:

We have 5 characters, so 3bits will be enough for the characters (2^3 =8). We can put: e=001, a=010, c=100, d=011, b=110 But I don't see how to use their frequency (we don't have the value of it, they just gave a "scale" of their frequency)

###### Asked By : Chrome Obie

Just so you get an idea: You could encode f letters by assigning 00 = a, 01 = b, 10 = c, 110 = d, 111 = e. If their frequencies are a ≥ b ≥ c ≥ d ≥ e, then this will be the optimal fixed-width encoding unless a is very common. If a was very common then you would assign 0 = a and find a good way to encode b, c, d and e. Anyway, this would save one bit every time a letter a, b or c is used.

If you decided to encode 0 = a, 100 = b, 101 = c, 110 = d, 111 = e, that would save one bit compared to the first encoding every time you encode an a, and lose one bit every time you encode b or c. So this is a better encoding if the freq (a) > freq (b) + freq (c). One more possibility is 0 = a, 10 = b, 110 = c, 1110 = d, 1111 = e; you'd have to calculate the average number of bits for this case as well.

(And google for Huffman to find this method fully worked out).

Yes, Huffman coding would take the two codes with the lowest frequences and combine them into one. For example, d and e are combined into one code de, and you follow for "de" by 0 to get a d, and you follow the code for "de" by 1 to get an e. And then you pick again the two codes with the lowest frequencies. Which would be either b and c, or c and "de". And so on.