Cheap and Secure Web Hosting Provider : See Now

# [Solved]: CFGs: detecting infinitely many derivations of a single string

, ,
Problem Detail:

Some particularly degenerate CFGs can produce a single string in infinitely many ways: for a dumb example, $S \to SS \mid \epsilon$ can produce the empty string as $S \to \epsilon$ or $S \to SS \to S \to \epsilon$, etc.

I'm writing a general CFG parser which can produce all the different parses for a given string, but obviously it chokes on examples like the above. I'd like to be able to detect when there may be infinitely many parses for a string, rather than just looping forever. Can I, and, if so, how?

My intuition is that a grammar has this property if and only if there is some non-useless nonterminal A (ie, A is used in the production of some string in the language of the CFG) which can produce itself via some chain of productions such that all of the other symbols produced by following that chain are nullable nonterminals of A itself. But I don't know if this is correct, and I'm not sure how to detect it.

(A nonterminal B is nullable if $B \to^* \epsilon$, ie, some series of productions replaces the symbol with the empty string.)

If a nonterminal can derive itself ($A \to A$) then you can have an infinite number of derivations deriving whatever else that nonterminal can derive. There is no need for it or any other nonterminal to derive $\epsilon$ (unless you are only concerned with an infinite number of parses for the empty string.)

In other words,

$$A \to A \mid a$$

can produce infinitely many parses for the string $a$. By contrast,

$$A \to A A \mid a$$

can only produce exponentially many parses for the string $a^n$.

As far as I know, the existence of a useful nonterminal which can (possibly indirectly) derive itself is necessary and sufficient for the existence of an infinite number of possible parses.

It's easy to detect whether a nonterminal can derive itself. First, use the standard algorithm to find all nullable nonterminals. Then define the relation $$R = \{ (A, B) \mid A \to \omega B \chi \wedge \omega \to \epsilon \wedge \chi \to \epsilon \}$$ which you can do in linear time by scanning all the productions keeping track of nullability.

Then construct $R^+$. If it contains $(A, A)$ for any nonterminal, you have a possibly indirect self-derivation.