Thue showed that there exist arbitrarily long square-free strings over an alphabet of three symbols (not true for two symbols). An open problem was posed, which is a generalization of Thue’s original result: given an alphabet list L = L1, . . . , Ln, where |Li| = 3, is it always possible to find a square-free string, w = w1w2 . . . wn, where wi ∈ Li? In this paper we show that squares can be forced on square-free strings over alphabet lists iff a suffix of the square-free string conforms to a pattern which we term as an offending suffix. We also prove properties of offending suffixes. However, the problem remains tantalizingly open.
General Principles of Intellectual Property: Concepts of Intellectual Proper...
Forced repetitions over alphabet lists
1. Forced repetitions over alphabet lists
Neerja Mhaskar and Michael Soltys
PSC 2016
August 31, 2016
Forced - Mhaskar & Soltys PSC 2016 Title - 1/19
2. Thank you Franya Franek for presenting for us at PSC2015
Logical foundations of String Algorithms — a formalization
Forced - Mhaskar & Soltys PSC 2016 Title - 2/19
3. Summary
Thue’s 1906 problem: construct word of arbitrary length over
a ternary alphabet without repetitions.
Grytczuk in 2010 asked if the same holds true over an
alphabet list. Showed the answer is “yes” over alphabet lists
where each alphabet has at least 4 symbols.
Problem still open for alphabet lists with 3 symbols per
alphabet. We show lots of partial results.
Forced - Mhaskar & Soltys PSC 2016 Summary - 3/19
4. Thue’s problem and solution
Σ3 = {1, 2, 3}
morphism:
S =
1 → 12312
2 → 131232
3 → 1323132
(1)
Given a string w ∈ Σ∗
3, we let S(w) denote w with every symbol
replaced by its corresponding substitution:
S(w) = S(w1w2 . . . wn) = S(w1)S(w2) . . . S(wn).
Forced - Mhaskar & Soltys PSC 2016 Thue’s problem - 4/19
5. Grytczuk’s problem
Let:
L = L1, L2, . . . , Ln ,
be an ordered list of (finite) alphabets.
We say that w is a string over the list L if w = w1w2 . . . wn where
for all i, wi ∈ Li .
We impose no conditions on the Li ’s: they may be equal, disjoint,
or have elements in common.
The only condition on w is that the i-th symbol of w must be
selected from the i-th alphabet, i.e., wi ∈ Li .
Forced - Mhaskar & Soltys PSC 2016 Grytczuk results - 5/19
6. If |Li | ≥ 4, then no matter what the list is, there is always a
squre-free string over such a list.
This can be shown with a probabilistic algorithm:
in its i-th iteration, it selects randomly a symbol from Li ,
and continues if the string created thus far is square-free,
and otherwise deletes the suffix consisting of the right
side of the square it just created, and restarts from the
appropriate position.
Forced - Mhaskar & Soltys PSC 2016 Grytczuk results - 6/19
7. Grytczuk proves by a simple counting argument that with positive
probability the length of a constructed sequence exceeds any finite
bound, provided the number of symbols is at least 4.
⇒ This implies the existence of arbitrarily long square-free strings.
Note that this argument is non-constructive (or rather, it yields an
exponential time procedure).
Thus it is a weaker result than the method employed in Thue’s
original problem.
This approach relies on Moser’s algorithmic proof of Lov´asz Local
Lemma: the entropy compression argument.
Forced - Mhaskar & Soltys PSC 2016 Grytczuk results - 7/19
8. Open Problem 1:
Is there a polytime procedure for computing a square-free string
from a given list of lenght n?
Forced - Mhaskar & Soltys PSC 2016 Grytczuk results - 8/19
9. List of 3 symbol alphabets
L = L1, L2, . . . , Ln, where |Li | = 3.
We can prove it for restricted types of such lists:
L has an SDR (System of Distinct Representatives)
L has the union property
L has a consistent mapping (testing for one is NP-hard)
L is a partition
Forced - Mhaskar & Soltys PSC 2016 Our results - 9/19
10. Offending suffix pattern
Let C(n), an offending suffix, be a pattern defined recursively:
C(1) = X1a1X1, and for n > 1,
C(n) = XnC(n − 1)anXnC(n − 1).
Zimin words . . . !
Forced - Mhaskar & Soltys PSC 2016 Our results - 10/19
11. We are interested in:
C(3) = X3X2X1a1X1a2X2X1a1X1a3X3X2X1a1X1a2X2X1a1X1,
Cs(3) = a1a2a1a3a1a2a1,
and observe that Cs(3)ai , for i = 1, 2, 3, all map to strings with
squares.
Forced - Mhaskar & Soltys PSC 2016 Our results - 11/19
12. Result
If w is a square-free string over L = L1, L2, . . . , Ln−1, then
Ln = {a, b, c} forces a square iff w has suffix C(3).
The proof of this result is most of the PSC2016 paper.
⇐ direction is trivial
Forced - Mhaskar & Soltys PSC 2016 Our results - 12/19
13. ⇒ As all three squares tata, ubub, vcvc are suffixes of the string
w, it follows that t, u, v must be of different sizes, and so we can
order them without loss of generality as follows:
|tat| < |ubu| < |vcv|.
We now consider different cases of the overlap of tat, ubu, vcv,
showing in each case that the resulting string has a suffix
conforming to the pattern C(3).
Forced - Mhaskar & Soltys PSC 2016 Our results - 13/19
14. For instance:
v = pubu, where p is a proper non-empty prefix of v.
Since w is square-free, we assume that pubu has no square, and
therefore p = u and p = b.
From this, we get vcv = pubucpubu. Therefore, this case is
possible.
ubup
v c v
Forced - Mhaskar & Soltys PSC 2016 Our results - 14/19
15. Open Problem 2:
Show that for any L = L1, L2, . . . , Ln, where |Li | = 3, there always
exists a square-free w over L.
Problem: very difficult to run meaningful simulations, as the
number of cases jumps up very quickly.
Hope: can we use our offending suffix characterization to do a
counting argument?
Forced - Mhaskar & Soltys PSC 2016 Conclusion - 15/19
16. Open Problem 3:
Map L = i Li → [n] where n = | i Li |.
Let ML = (mij ) where mij = 1 ⇐⇒ j ∈ Li .
So each row if ML will have 3 1s.
Problem: given any such matrix, can we select a single 1 from
each row in such a way that there are no 2k consecutive rows,
where the initial k rows equal the next k rows.
How to run these simulations for n > 10?
Forced - Mhaskar & Soltys PSC 2016 0-1 Matrices - 16/19
17. Crossing sequences
A clever technique in complexity to show:
Palindromes require Θ(n2) steps on a single tape Turing
machine.
If a language requires less then o(log log n) memory to be
decided it is in fact regular: “miniscule memory = no
memory.”
We use this to show that we can always avoid very large
repetitions (1/c, c ∈ N) over any list.
Forced - Mhaskar & Soltys PSC 2016 Crossing - 17/19
18. Perl
Truly sophisticated text manipulation libraries.
CGI.pm is a Perl module for CGI web applications — the workhorse
of the Internet.
The text processing is done with state of the art algorithms from
. . . 1970s
The advances in String Algorithms should be reflected in popular
implementations.
Forced - Mhaskar & Soltys PSC 2016 Conclusion - 18/19