Cheap and Secure Web Hosting Provider : See Now

# [Solved]: Semantic Code Comparison

, ,
Problem Detail:

Consider two codes that do the same thing, in the same time and memory order, But they don't do it exactly the same way. Is there any Idea for a program to declare those codes as the same?

For Example:

Change Vars 1

``a = a + b; b = a - b; a = b - a; ``

Change Vars 2

``int temp = a; a = b; b = temp; ``

or consider two bubble sort codes that sort ascending, one starting from the end, one from the beginning of the array.

There is the idea to execute both codes and then check if the memory status is similar after the execution of both codes. But that's not very comprehensive.

Edit: My focus is on Turing complete languages mostly.

## The dark side of the question

Computation theory tells us that checking whether two programs, or program fragments, are equivalent is not decidable.

What that means is only that there is no unique technique that can check that equivalence of any pair of programs. This remains true if you consider a single programming language, as long as it is Turing complete. (Note: I do not understand what you intend when mentionning Turing complete algorithms in your comment - and, by the way, precisions should be integrated in the question, preferably to comments).

No unique technique also means no unique finite set of techniques as they could be applied simultaneously. It also means no infinite set of techniques that is finitely describable, etc.

This can be formally proved on Turing machine with Rice theorem (which is a bit subtle to use). The proof can be tediously transposed to any other Turing complete formalisation of computation. But invoking Church-Turing thesis is usually considered enough.

To summarize it, there is no way you can produce a system that will take two arbitrary program fragments of a Turing Complete language and tell you when they are equivalent semantically, i.e. when their computation results are the same.

But do not despair, there is hope.

## The bright side of the question

While the above is true when you make the question so general, it does not mean that this can never be done. Actually, this, or problems very close to it, is the object of considerable reasearch. The undecidability statement should only be seen as a limitation to what is to be expected, but there is considerable room within that limitation.

So, there are many situation when it is actually possible to apply a procedure that will actually decide whether two (fragments of) programs are actually equivalent. The applicability of such a procedure can be defined by a language (that is not Turing complete), or by some limitation on the computational power such fragments can express (so that they do not have to be in the same language).

Much of the research related to type theory also concerns provability of programs properties, and can lead to answers to your question. But that is much outside my competence.

Many other techniques have been developed for your purpose.

Your idea of running both codes and comparing results is a good one, at least in simple cases, like your example.

But you have to run the code symbolically, and then use a symbolic computation system to check that the answers are indeed the same.

So, assume that initially a==\$a\$ and b==\$b\$, where I use italics for symbolic expressions, i.e. non evaluated formulae. The symbols \$a\$ and \$b\$ just stand for themselves, and have no associated value.

running the first code:

a = a + b; - - so a==\$a+b\$
b = a - b; - - so b==\$(a+b)-b\$
a = a - b; - - so a==\$(a+b)-((a+b)-b)\$

Recall, again, that what is in italics is just symbolic expressions, trees if you prefer. There is nothing to be computed.

running the second code:

int temp = a; - - so temp==\$a\$
a = b; - - so a==\$b\$
b = temp;; - - so b==\$a\$

Now you give these results to a symbolic calculator that check that the values of variables a and b are the same at the end. It must be able to simplify expressions such as \$(a+b)-b\$ and \$(a+b)-((a+b)-b)\$ to respectively \$a\$ and \$b\$, which requires using known algebraic properties of the operators \$+\$ and \$-\$.

It is actually a good technique (when applied properly - I goofed my first try),as it allowed me to notice that your two codes were not equivalent, and I corrected the first one.

Running the code symbolically is an example of a general paradigm called abstract interpretation. The "casting out nines" test is a very elementary example of these techniques.

Symbolic evaluation and abstract interpretations are a well studied way of proving things about programs.

It is pretty much what is done by type checkers.

But it is far from the whole story about proving things about programs. Large systems are being developed to prove properties of programs by different means.

In other cases (probably for your sorting examples), it may be better to have a specification of what the program is supposed to do and separately prove the two pieces of code conformant with that specification. This avoid having to consider simultaneously the specifics of two algorithms.