Circular String Linearization
Moderator: Board moderators
Circular String Linearization
This is a problem mentioned in Dan Gusfied 'Algorithms on Strings Tress and Sequences'
(page 148 - problem is named APL12). There the author says it could be easily solved
if we build a suffix tree.
Glass Beads - UVA Problem 719
http://icpcres.ecs.baylor.edu/onlinejud ... 7/719.html
is exactly the same problem.
Still I don't think suffix tree is possible for this problem just because the
input string can be up to 10,000 chars. So it would be too much memory
to build a suffix tree for such a string (I think).
My first question is: what do you think about that?
Would you agree that a suffix tree there might take
too much memory?
Of course the same question applies if we decide to use Trie - which
is basically a less optimized version of the Suffix Tree
(can be regarded as such, at least, in a sense).
Second thing I want to ask is the following.
There are almost no hints on problem 719 in this forum.
The only three notes which I see are here on the following location.
http://acm.uva.es/board/viewtopic.php?f ... 122ce2a20f
They are basically
* a suggestion to use suffix tree (which I think will not
even work due to memory constraints)
* a suggestion (from Adrian Kuegel) which says
"You don't need any advanced data structure for this
problem. Hint: think of something similar to BFS."
* a statement: this problem can be solved in O(n)
I thought for quite some time but can't find a way yet to
transform this problem into something similar to BFS. I mean
I see a direct transformation but then the worst-case complexity
will still be O(N^2) /in the way I think about it/.
So my second question is: can somebody provide
a slightly greater hint on this problem (I don't
want a spoiler just a slightly greater hint)?
You may send me a personal
message if you prefer so.
Many thanks in advance.
(page 148 - problem is named APL12). There the author says it could be easily solved
if we build a suffix tree.
Glass Beads - UVA Problem 719
http://icpcres.ecs.baylor.edu/onlinejud ... 7/719.html
is exactly the same problem.
Still I don't think suffix tree is possible for this problem just because the
input string can be up to 10,000 chars. So it would be too much memory
to build a suffix tree for such a string (I think).
My first question is: what do you think about that?
Would you agree that a suffix tree there might take
too much memory?
Of course the same question applies if we decide to use Trie - which
is basically a less optimized version of the Suffix Tree
(can be regarded as such, at least, in a sense).
Second thing I want to ask is the following.
There are almost no hints on problem 719 in this forum.
The only three notes which I see are here on the following location.
http://acm.uva.es/board/viewtopic.php?f ... 122ce2a20f
They are basically
* a suggestion to use suffix tree (which I think will not
even work due to memory constraints)
* a suggestion (from Adrian Kuegel) which says
"You don't need any advanced data structure for this
problem. Hint: think of something similar to BFS."
* a statement: this problem can be solved in O(n)
I thought for quite some time but can't find a way yet to
transform this problem into something similar to BFS. I mean
I see a direct transformation but then the worst-case complexity
will still be O(N^2) /in the way I think about it/.
So my second question is: can somebody provide
a slightly greater hint on this problem (I don't
want a spoiler just a slightly greater hint)?
You may send me a personal
message if you prefer so.
Many thanks in advance.
Last edited by Sedefcho on Tue Feb 10, 2009 2:18 pm, edited 1 time in total.
Re: Circular String Linearization
Suffix trees take about 5-10 words per each character in input, so a suffix tree for 10000 characters would need less than half a megabyte - a very modest amount of memory today.Sedefcho wrote:Still I don't think suffix tree is possible for this problem just because the
input string can be up to 10,000 chars. So it would be too much memory
to build a suffix tree for such a string (I think).
And for this problem, I think, suffix arrays are the way to go in contests -- I have a very simple O(n log^2 n) implementation in only 20 lines of C++, and it needs only 16 bytes per char.
There's also O(n) algorithm (I think it's called Duval's algorithm), based on theory of Lyndon words and prime strings, but I didn't look deeply into it.
-
- Learning poster
- Posts: 51
- Joined: Tue Sep 04, 2007 2:12 pm
- Location: Russia, Saratov
- Contact:
Re: Circular String Linearization
There are lots of approaches to solve this problem.
First of all, hashes. Indeed, we can compute LCP (longest common prefix) of any two suffixes S[i..] and S[j..] of given string (we can do this be binary search in O(log N), N = S.length). After that we can compare lexicographically any two suffixes S[i..] and S[j..] in O(log N) (we should find LCP and compare s[i+LCP] and s[j+LCP]). Finally, for given string S we can find it's minimal cyclic shift by taking string T=S+S and comparing all suffixes 0..N-1, so O(N logN) with quite easy algorithm.
Sample implementation:
Second, Duval. For given string S of length N it finds in O(N) time and O(1) memory its Lyndon decomposition, i.e. decomposition into substrings S1,S2,...,Sk such that S1+...+Sk = S and S1>=S2>=...>=Sk and each Si is a simple string (every its suffix is lexicographically strictly smaller than Si itself). Sounds very hard, but the algorithms is very easy. It's almost a greed algorithm, but with some back-tracking, and nevertheless it works in linear time (more precisely, no more than 4N-3 char comparisons).
Implementation of Duval:
If you are interested, I can describe here Duval's algorithm and how it works.
Then, if we know Lyndon decomposition for a string S+S, then we can find minimal cyclic shift as a start position of last block of same simple substrings, placed before N:
Third, there are some approaches to minimal cyclic shift, connected with prefix-function or z-function. They should be also very simple, but unfortunately I don't know them 
First of all, hashes. Indeed, we can compute LCP (longest common prefix) of any two suffixes S[i..] and S[j..] of given string (we can do this be binary search in O(log N), N = S.length). After that we can compare lexicographically any two suffixes S[i..] and S[j..] in O(log N) (we should find LCP and compare s[i+LCP] and s[j+LCP]). Finally, for given string S we can find it's minimal cyclic shift by taking string T=S+S and comparing all suffixes 0..N-1, so O(N logN) with quite easy algorithm.
Sample implementation:
Code: Select all
long long get_hash (const vector<long long> & h, const vector<long long> & pw, int i, int j) {
return ((j>=0 ? h[j] : 0) - (i ? h[i-1] : 0)) * pw[h.length()-i];
}
bool smaller (const string & s, int n, const vector<long long> & h, const vector<long long> & pw, int i, int j) {
int l = 0, r = n;
while (l < r) {
int m = (l + r + 1) >> 1;
if (get_hash (h, pw, i, i+m-1) == get_hash (h, pw, j, j+m-1))
l = m;
else
r = m-1;
}
return s[i+l] < s[j+l];
}
Implementation of Duval:
Code: Select all
string s;
int n = (int) s.length();
int i=0;
while (i < n) {
int j=i+1, k=i;
while (j < n && s[k] <= s[j]) {
if (s[k] < s[j])
k = i;
else
++k;
++j;
}
while (i <= k) {
cout << s.substr (i, j-k) << ' ';
i += j - k;
}
}
Then, if we know Lyndon decomposition for a string S+S, then we can find minimal cyclic shift as a start position of last block of same simple substrings, placed before N:
Code: Select all
string min_cyclic_shift (string s) {
s += s;
int n = (int) s.length();
int i=0, ans=0;
while (i < n/2) {
ans = i;
int j=i+1, k=i;
while (j < n && s[k] <= s[j]) {
if (s[k] < s[j])
k = i;
else
++k;
++j;
}
while (i <= k) i += j - k;
}
return s.substr (ans, n/2);
}

Re: Circular String Linearization
Thanks for these detailed explanations & references, I will try to
have a look and hopefully understand them (or at least some of them).
Thanks a lot.
have a look and hopefully understand them (or at least some of them).
Thanks a lot.
Re: Circular String Linearization
Can you please post it here ?And for this problem, I think, suffix arrays are the way to go in contests -- I have a very simple O(n log^2 n) implementation in only 20 lines of C++, and it needs only 16 bytes per char.
Re: Circular String Linearization
Yes, here it is:
I wrote it for my ICPC team's reference notebook, not specifically for this UVa's problem. And Sedefcho already told me that, unfortunately, this code slightly timeouts on it :(
Also, check this thread on TC forums for another implementation of this same algorithm -
http://forums.topcoder.com/?module=Thre ... 79&start=0
Code: Select all
// Suffix array in $O(n \log^2 n)$ time, 16 bytes/char overhead.
// Input: text, N
// Output:
// sa[] is a sorted list of offsets to all non-empty suffixes,
// lcp[i] = length of longest common prefix of text+sa[i] and text+sa[i+1]
char text[MAX]; long long key[MAX]; int N, sa[MAX], rank[MAX], *lcp=(int*)key;
struct Cmp { bool operator()(int i, int j) const { return key[i]<key[j]; } };
void build() {
for (int i = 0; i < N; i++) { sa[i] = i; key[i] = text[i]; }
sort(sa, sa+N, Cmp());
for (int K = 1; ; K *= 2) {
// invariant: suffixes are now sorted by their first K characters
for (int i = 0; i < N; i++)
rank[sa[i]] = i>0 && key[sa[i-1]]==key[sa[i]] ? rank[sa[i-1]] : i;
if (K >= N) break;
// sort suffixes by key (rank of this suffix, rank of suffix K chars ahead)
for (int i = 0; i < N; i++)
key[i] = rank[i] * (N+1LL) + (i+K < N ? rank[i+K]+1 : 0);
sort(sa, sa+N, Cmp());
// now they're sorted by their first 2K characters
}
// Kasai's algorithm for LCP array:
for (int i = 0, k = 0; i < N; i++) {
if (k > 0) k--;
if (rank[i] == N-1) { lcp[N-1] = -1; k = 0; continue; }
int j = sa[rank[i]+1];
while (text[i+k] == text[j+k]) k++;
lcp[rank[i]] = k;
}
}
Also, check this thread on TC forums for another implementation of this same algorithm -
http://forums.topcoder.com/?module=Thre ... 79&start=0
Re: Circular String Linearization
thanks,mf wrote:Y
I wrote it for my ICPC team's reference notebook
If they are commented like above, do you have them online somewhere or can you send the rar to tryitn1@gmail.com. Would certainly be useful.
Re: Circular String Linearization
I've put my old ACM ICPC notebook online here.
But the codes there have a lot less comments, though.
Edited on 2010/10/02: updated link.
But the codes there have a lot less comments, though.
Edited on 2010/10/02: updated link.
Re: Circular String Linearization
[quote="mf"]Yes, here it is:
The above algorithm assumes that LCP of 2 consecutive suffixes j,j+1 (none of them rank ==N-1) has atleast LCP of ( j-1,j ) -1 in common . Why is this true ?
Code: Select all
// Suffix array in $O(n \log^2 n)$ time, 16 bytes/char overhead.
// Input: text, N
// Output:
// sa[] is a sorted list of offsets to all non-empty suffixes,
// lcp[i] = length of longest common prefix of text+sa[i] and text+sa[i+1]
char text[MAX]; long long key[MAX]; int N, sa[MAX], rank[MAX], *lcp=(int*)key;
struct Cmp { bool operator()(int i, int j) const { return key[i]<key[j]; } };
void build() {
for (int i = 0; i < N; i++) { sa[i] = i; key[i] = text[i]; }
sort(sa, sa+N, Cmp());
for (int K = 1; ; K *= 2) {
// invariant: suffixes are now sorted by their first K characters
for (int i = 0; i < N; i++)
rank[sa[i]] = i>0 && key[sa[i-1]]==key[sa[i]] ? rank[sa[i-1]] : i;
if (K >= N) break;
// sort suffixes by key (rank of this suffix, rank of suffix K chars ahead)
for (int i = 0; i < N; i++)
key[i] = rank[i] * (N+1LL) + (i+K < N ? rank[i+K]+1 : 0);
sort(sa, sa+N, Cmp());
// now they're sorted by their first 2K characters
}
// Kasai's algorithm for LCP array:
for (int i = 0, k = 0; i < N; i++) {
if (k > 0) k--;
if (rank[i] == N-1) { lcp[N-1] = -1; k = 0; continue; }
int j = sa[rank[i]+1];
while (text[i+k] == text[j+k]) k++;
lcp[rank[i]] = k;
}
}
Re: Circular String Linearization
That's not true.The above algorithm assumes that LCP of 2 consecutive suffixes j,j+1 (none of them rank ==N-1) has atleast LCP of ( j-1,j ) -1 in common . Why is this true ?
What's a true fact here is a bit more complex to formulate. But just for example, take the string "abcdabcz". The first suffix is "abcdabcz", its LCP is 3, because the next lexicographically larger suffix (next suffix in suffix array) is "abcz". And so we can deduce that LCP of 2nd suffix - "bcdabcz" is at least 2, because there's the suffix "bcz" (just drop the first letter of "abcz").
And I already told you who the author of the algorithm is, why didn't you just google for the paper?
Re: Circular String Linearization
The link doesn't seem to work anymore. Could you please share it again? Or send me by e-mail to mr_mogers@sapo.ptmf wrote:I've put my old ACM ICPC notebook online here.
But the codes there have a lot less comments, though.
Thank you
Re: Circular String Linearization
Sorry for delay.
I've put most of my ACM ICPC-related stuff here - http://github.com/infnty/acm/
My old team notebook is at http://github.com/infnty/acm/raw/master ... tebook.pdf
I've put most of my ACM ICPC-related stuff here - http://github.com/infnty/acm/
My old team notebook is at http://github.com/infnty/acm/raw/master ... tebook.pdf
-
- Experienced poster
- Posts: 110
- Joined: Tue May 06, 2008 2:18 pm
- Location: CSE-DU, Bangladesh
- Contact:
Re: Circular String Linearization
probably this is the simplest algorithm:
This is called Minimum Expression introduced by Zhou Yuan.
Code: Select all
/*
Finds alphabetically first representation of a cyclic string in O(length)
*/
minimumExpression(s) {
s = s + s;
len = length(s), i = 0, j = 1, k = 0;
while(i + k < len && j + k < len) {
if(s[i+k] == s[j+k]) k++;
else if(s[i+k] > s[j+k]) { i = i+k+1; if(i <= j) i = j+1; k = 0; }
else if(s[i+k] < s[j+k]) { j = j+k+1; if(j <= i) j = i+1; k = 0; }
}
return min(i, j);
}
You should not always say what you know, but you should always know what you say.