If you want to make money online : Register now

[Solved]: Variable Length Encoding of Integers Using a Modulus Algorithm

, ,
Problem Detail:

Continuing on the theme from my last question Variable Length Encoding of Integers, I have come up with a simple encoding scheme, but for which an efficient algorithm eludes me.

The constraints are simple enough: no (binary representation) number is allowed where it is divisible by 3, or a subset (prefix) of that representation is divisible by 3.

To terminate the number two bits are added so that the number is divisible by 3.

For example 1101 is allowed since neither 1101, 101, 01 nor 1 are divisible by 3.

However, 1011 is not allowed since 11 is divisible by 3.

The representation 1101 would then have 10 prepended to make it divisible by 3 (101101).

All this allows a stream of bits to be read, at each point testing to see if the number is divisible by three. If not, keep reading the next bit, until it is divisble by three. Hence allowing for a (unique) variable length encoding.

My question is about the mapping of integers on to this encoding scheme. However hard I try I can't seem to create a straightforward algorithm to do the mapping. Is there one?

Clarification

The examples above are to be read in the order right to left. So any prefix is on the right. So the stream of bits will be read right to left.

The first step is counting the number of strings of length $n$ in which no non-empty prefix is divisible by $3$. Surprisingly enough, these are given by Fibonacci numbers $F_n$ for $n > 0$ (exercise). I will show how to map the natural numbers to strings of these forms, from which your encoding will follow. Divide the natural numbers into parts of lengths $1,F_1,F_2,F_3,\ldots$. Using the formula $F_1+\cdots+F_n = F_{n+2}-1$, we see that numbers $m$ in the range $F_{n+1} \leq m < F_{n+2}$ will be encoded using strings of length $n$; additionally, $0$ will be encoded by the empty string. Given $m$, it should be straightforward to find $n$, for example using the Fibonacci recurrence or the approximate formula $F_n \approx \frac{1}{\sqrt{5}} \left(\frac{1+\sqrt{5}}{2}\right)^n$.

Suppose $m > 0$, and let $x = m - F_{n+1}$, so that $0 \leq x < F_n$. It remains to encode $x$ as a string of length $n$ in which no non-empty prefix is divisible by $3$. The idea is to interpret the recurrence $F_n = F_{n-1} + F_{n-2}$ accordingly. There are many ways of doing it, here is one. There are $F_{n-1}$ legal strings of the form $0s$, and $F_{n-2}$ legal strings of the form $1\alpha s$, where $\alpha$ is determined so neither $\alpha s$ nor $1\alpha s$ are divisible by $3$. We can encode the first $F_{n-1}$ values of $x$ using strings of the form $0s$, and the latter $F_{n-2}$ values of $x$ using strings of the form $1\alpha s$. To determine $s$ itself, we run this procedure recursively. The base case is the unique string of length $1$, namely $1$.

To illustrate the entire procedure, let us show how to encode $m = 7$. First we find that $F_5 = 5 \leq 7 < F_6 = 8$, and so $n = 4$ and $x = 7 - 5 = 2$. We write $F_4 = F_3 + F_2 = 2 + 1$, so that the encoded string is of the form $s_4 = 1\alpha_4 s_2$. The string $s_2$ itself is the encoding of $x - F_3 = 0$ of length $2$. Writing $F_3 = F_2 + F_1 = 1 + 1$, we see that $s_2$ has the form $s_2 = 0s_1$. The string $s_1$ itself is the encoding of $0$ of length $1$, which is $s_1 = 1$. Therefore $s_4 = 1\alpha_4 s_2 = 1\alpha_4 0s_1 = 1\alpha_4 01$. If $\alpha_4 = 0$ then $1001$ would be divisible by $3$, so $\alpha_4 = 1$ and the output is $1101$.

Question Source : http://cs.stackexchange.com/questions/28782