Cheap and Secure Web Hosting Provider : See Now

[Solved]: Skienna's recursive algorithm for edit distance

, , No Comments
Problem Detail: 

I'm having some trouble understanding part of Skienna's algorithm for edit distance presented in his Algorithm Design Manual. I'm posting the recursive version, prior to when he applies dynamic programming to the problem, but my question still stands in that version too I think.

He provides the following code:

#define MATCH       0 #define INSERT      1 #define DELETE      2  int match(char c, char d) {     if (c == d) return(0);     else return(1); }  int indel(char c) {     return(1); }  int string_compare(char *s, char *t, int i, int j) {         int k;                  /* counter */         int opt[3];             /* cost of the three options */         int lowest_cost;        /* lowest cost */          if (i == 0) return(j * indel(' '));         if (j == 0) return(i * indel(' '));          opt[MATCH] = string_compare(s,t,i-1,j-1) + match(s[i],t[j]);         opt[INSERT] = string_compare(s,t,i,j-1) + indel(t[j]);         opt[DELETE] = string_compare(s,t,i-1,j) + indel(s[i]);          lowest_cost = opt[MATCH];         for (k=INSERT; k<=DELETE; k++)                 if (opt[k] < lowest_cost) lowest_cost = opt[k];          return( lowest_cost ); } 

I am having trouble understanding the logic behind how the indices are decremented when arriving at opt[INSERT] and opt[DELETE]. They seem backwards to me.

How can I prove to myself that they are correct?

Asked By : Chris

Answered By : babou

The two strings s and t are compared starting from the high index, down to index 1. This is not visible since the initial call to string_compare is not provided. The i and j arguments for that initial call are the length of strings s and t.

It should be noted that s and t could be globals, since they are the same in all calls.

A call to the function string_compare(s,t,i,j) is intended to compute the minimum edit distance of the prefixes s[1..i] and t[1..j]. It first compares the two strings at indices i and j, and the recursively at lower indices.

It always tries 3 ways of finding the shortest distance:

  1. by assuming there was a match or a susbstitution edit depending on whether s[i]==t[j];

  2. by assuming there is an insertion edit of t[j];

  3. by assuming there is an deletion edit of s[i];

Then it computes recursively the sortest distance for the rest of both strings, and adds 1 to that result, when there is an edit on this call.

When s[i]==t[j] the two strings match on these indices. Hence the corresponding indices are both decremented, to recursively compute the shortest distance of the prefixes s[1..i-1] and t[1..j-1]. A proper match does not increase the distance. Hence dist(s[1..i],t[1..j])= dist(s[1..i-1], t[1..j-1]).

When s[i]=/=t[j] the two strings do not match, but can be made to match by a substitution edit. Hence the same recursive call is possible, but the resulting shortest distance must be incremented by one for the substitution edit. Hence dist(s[1..i],t[1..j])= dist(s[1..i-1], t[1..j-1])+1.

That is why the function match returns 0 when there is a match, and 1 when there is none.

Another possibility is not to try for a match, but assume that t[j] is due to an insertion edit in the case of the smallest distance. Hence that inserted symbol is ignored by replacing t[1..j] by t[1..j-1], ie by computing the shortest distance of s[1..i] and t[1..j-1], which is string_compare(s,t,i,j-1), and then adding 1 for the insertion edit.

The next and last try is the symmetric one, when one assume that the symbol s[i] was deleted, and thus does not have to appear in t.

The results of the 3 attempts are strored in the array opt, and the smallest value of the 3 is kept as shortest distance for s[1..i] and t[1..j].

The decrementations of indices is either because the corresponding string elements match, or because they have been taken into account by an edit operation.

Best Answer from StackOverflow

Question Source :

3.2K people like this

 Download Related Notes/Documents


Post a Comment

Let us know your responses and feedback