Cheap and Secure Web Hosting Provider : See Now

[Solved]: Resolve left-rescursion

, ,
Problem Detail:

Can anybody give me a hint on how to get rid of the left recursion in the following grammar?

$$A \rightarrow B \mid a$$ $$B \rightarrow b \mid C \mid D \mid E \mid F \mid G$$ $$C \rightarrow c \mid A d$$ $$D \rightarrow A e \mid f$$ $$E \rightarrow g \mid B h$$ $$F \rightarrow i \mid A j \mid k$$ $$G \rightarrow l \mid A m \mid n$$

So far I've tried this algorithm and this tool. But both are only applicable for smaller grammars and provide practically unusable results (1000's of rules).

This explanatory part can be skipped :

Non-terminal in a Context-Free Grammar can be interpreted as standing for the set of strings they generate (this is actually one standard interpretation of CF grammars as string set equations).

For example $A \rightarrow C \mid D \mid a$ may be read as a set equation $A = C \cup D \cup \{a\}$ where $A$, $C$ and $D$ now stand for sets of strings and {a} for a singleton set containing the string "$a$".

Algebraic properties of concatenation and set union on strings and string sets can be used to transform grammars. In particular, you can use factorization in ways that are similar to what you do in arithmetics to make formulae simpler to compute.

Typically, if $A$, $B$ and $C$ are string sets (possibly reduced to a simgleton, i.e. a single string), then $(A \cup B)C = AC \cup BC$, i.e. concatenation is distributive for set union, like product is for sum.

How to do it

So, when you transform a CF grammar, you can first reduce the size with factorization. For that, you may replace some non-terminals by the right-hand-side of their defining rule, or introduce new non-terminals when convenient.

$A \rightarrow C \mid D \mid a$
$C \rightarrow c \mid A d$
$D \rightarrow A e \mid f$

can be rewritten as

$A \rightarrow X \mid a$
$X \rightarrow AY \mid Z$
$Y \rightarrow d\mid e$
$Z \rightarrow c\mid f$

There are various ways of using this, but you should be able to use it to do away with the complexity of your grammar.

I deliberately did not make it as simple as I could have. My purpose is only to give you an idea of what can be done.

The correctness of transformations can also be proved directly on derivations, without having to consider interpretations on string sets as I did above.

Complete solution

Here is a suggested answer. Hoping to simplify presentation, I used extended context-free notation that allows for regular expressions on the right-hand side (RHS) of rule. It does not change the practical uses and properties of CF grammars. You can do it as well by introducing new non-terminals. This was not really necessary here, but I found it convenient while doing the transformations. In retrospect, it would be simpler without it.

With this convention the rule $X\rightarrow Y\mid Z$ is equivalent to $X\rightarrow Y+Z$. Essentially the operators ¨$+$" and ¨$\mid$" have the same role.

Starting with your grammar, I first eliminate some non-terminals by replacing them by the corresponding RHS. I do that in the rule for $B$, and it gives:

$A\rightarrow B \mid a$
$B\rightarrow (b+c+f+g+i+k+l+n) \mid A(d+e+j+m) \mid Bh$

Then, substituting $A$ by the RHS of its rule: $A\rightarrow B \mid a$
$B\rightarrow (b+c+f+g+i+k+l+n) \mid (B+a)(d+e+j+m) \mid Bh$

and simplifying

$A\rightarrow B \mid a$
$B\rightarrow (b+c+f+g+i+k+l+n) \mid a(d+e+j+m) \mid B(h+d+e+j+m)$

And going back tu usual CF notation:

$A\rightarrow B \mid a$
$B\rightarrow BX \mid Y$
$X\rightarrow h \mid U$
$U\rightarrow d\mid e\mid j\mid m$
$Y\rightarrow aU \mid V$
$V\rightarrow b\mid c\mid f\mid g\mid i\mid k\mid l\mid n$

Now you only have to remove the left recursion from the $B$ rule in the usual way, replacing it by:

$B\rightarrow Y \mid YH$
$H\rightarrow X \mid XH$

and you are set.