Cheap and Secure Web Hosting Provider : See Now

# [Solved]: Represent string as concatenations

, ,
Problem Detail:

If $S_1,S_2$ are set of strings, then $S_1S_2 = \{s_1s_2|s_1\in S_1, s_2\in S_2\}$. $S^0=\{\epsilon\}$, $\epsilon$ is the empty string. $S^n = S^{n-1}S$.

Two related problems about represent string as concatenation of other strings.

1. Given a finite set $S$ of strings, how to decide if there exist a string can be written as concatenations of elements in $S$ in two different ways?

2. Given a finite set $S$ of strings and $n$, how can one compute the smallest set of strings $T$, such that $S\subset T^n$?

(Bonus: what about infinite $S$, at least when it's regular? For the second problem when $S$ is infinite, we might ask to find a minimal $T$ under set inclusion.)

#### Answered By : Hendrik Jan

I will try (1) and go for infinite (the bonus).

The finite case is called the code problem: finding possible decompositions is like decoding. Its classical algorithm is called Sardinas–Patterson. It is not immediately clear it is a finite search space. The lengths of the decomposition might be long. Still it is not Post Correspondence as that has additional restrictions.

The finite case has a simple regular language solution: for every pair of distinct $s,t \in S$ test whether the regular set $s S^* \cap t S^*$ is empty. (The Sardinas-Patterson tries that "in parallel".) This does not work in the infinite case as there would be infinitely many $s,t$ to test.

In the infinite (but regular) case I would build a new finite state automaton based on an automaton for $S$. It simulates in parallel the two different decompositions by keeping track of a pair of states, one for each decomposition. On a letter both pairs make a step on the same letter. In a final state each component may either restart (it has seen an element in $S$) or proceed (trying to find a longer word in $S$). Now we need one additional boolean addition to the state-pair as we need to check whether the two components made different choices in at least one point. Accept if both simulations end in a final state (and we have seen a point where the two decompositions differed). Now again there are no two different decompositions for the same string iff the language of the automaton is empty.

And indeed, for context-free the problem must be undicidable, but I do not have a reference at hand now.