Suffix Arrays
Moderator: Board moderators
Suffix Arrays
Hi, I want to build the suffix array of a given string, for instance if the string is abcd then the suffix array is abcd, bcd, cd, d. To build it I concatenate strings but it turns out that this is very slow operation. Can you help me to improve it ?

 New poster
 Posts: 5
 Joined: Sat Dec 30, 2006 2:16 pm
 Location: Hyderabad
 Contact:
I think you don't actually need to store all the suffixes separately. What you can do is just store the indexes in a single dimentional array and whenever you have to access any particular suffix just use the position stored in the suffix array to do that. A very naive approach to build the suffix array might look something like this ..
Code: Select all
/* Input a string and build its suffix array */
#include<cstdio>
#include<cstring>
#include<cstdlib>
#define MAXLEN 100000
char s[MAXLEN+1];
int suffixArray[MAXLEN],n;
int compare(const void *p1,const void *p2){
return strcmp(s+*(int *)p1,s+*(int *)p2);
}
int main(){
scanf("%s",s);
n = strlen(s);
for(int i=0;i<n;i++){
suffixArray[i] = i; }
qsort(suffixArray,n,sizeof(int),compare);
// Now you can use the suffix array in the application you want to use ..
return 0;
}
People die young because god loves them so much, I am still on earth because there is a godess here who loves me even more.
ajaysomani, in your code the MAXLEN is set to 100000 but in fact it can only handle up to 1000 (if the timelimit is 1 second) since your code will run in O ( N^2 log N ). You will need a more sophisticated way to build Suffix Array in O ( N log N ) to handle 100000 data.
Visit my script to hunt UVA problem here:
http://felixhalim.net/uva/hunting/

Felix Halim
http://felixhalim.net/uva/hunting/

Felix Halim

 New poster
 Posts: 5
 Joined: Sat Dec 30, 2006 2:16 pm
 Location: Hyderabad
 Contact:
Hi ,
Well yeah the order is kind of O(n^2 * log n ) but the point is that not every comparison while sorting takes O(N) time. So the time taken will be very less in practice ( most cases ). And moreover if the string is sufficiently dissimilar in its characters then this method would run in reasonable amount of time and can Handle upto 10000. [ I should have one less 0 in the declaration there ].
PS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
PS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
Well yeah the order is kind of O(n^2 * log n ) but the point is that not every comparison while sorting takes O(N) time. So the time taken will be very less in practice ( most cases ). And moreover if the string is sufficiently dissimilar in its characters then this method would run in reasonable amount of time and can Handle upto 10000. [ I should have one less 0 in the declaration there ].
PS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
PS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
People die young because god loves them so much, I am still on earth because there is a godess here who loves me even more.
http://www.cs.helsinki.fi/u/tpkarkka/
Then, the judge's test data must be lacking of a test case like: A^50000. You can try how long to build a suffix array using qsort above if the test case is AAAAAA.... (A^50000). It should take foreverPS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
What I meant was if you are going to solve it in O( N log N ) then you need a more sophisticated approach, not just qsortPS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
Actually, there exist a linear O(N) suffix array construction by Juha K
Visit my script to hunt UVA problem here:
http://felixhalim.net/uva/hunting/

Felix Halim
http://felixhalim.net/uva/hunting/

Felix Halim

 New poster
 Posts: 5
 Joined: Sat Dec 30, 2006 2:16 pm
 Location: Hyderabad
 Contact:
Hmmmmmm ..... You are correct. I too think that I am lucky enough to get accepted. Anyway for normal contests I don't think that anybody tries linear time suffix array construction. [ I suppose it would be complicated ].
People die young because god loves them so much, I am still on earth because there is a godess here who loves me even more.
http://www.cs.helsinki.fi/u/tpkarkka/
Then, the judge's test data must be lacking of a test case like: A^50000. You can try how long to build a suffix array using qsort above if the test case is AAAAAA.... (A^50000). It should take foreverPS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
What I meant was if you are going to solve it in O( N log N ) then you need a more sophisticated approach, not just qsortPS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
Actually, there exist a linear O(N) suffix array construction by Juha Karkkainen and Peter Sanders. If you want to learn more about it, visit his site at:
http://www.cs.helsinki.fi/u/tpkarkka/
And find his publication about "Simple linear work suffix array construction." You can download his paper (in pdf) and the source code (in C).
In his paper he said that his skew algorithm is simpler than many best algorithms on Suffix Array (I find it also simpler ).
(edit: phpBB seemed doesn't support unicode characters, that made my post truncated. Thenon unicode character is removed and is now fixed)
Visit my script to hunt UVA problem here:
http://felixhalim.net/uva/hunting/

Felix Halim
http://felixhalim.net/uva/hunting/

Felix Halim
Not at all... His code is only 50 lines long. It's not something that cannot be done in normal programming contest Also, the idea is simple enough.Anyway for normal contests I don't think that anybody tries linear time suffix array construction. [ I suppose it would be complicated ].
In fact, at timus online judge there is a problem that requires you to build a Suffix Array/Tree/Automaton in O( N ), otherwise TLE:
http://acm.timus.ru/problem.aspx?space=1&num=1517
You can see that Suffix Array becomes one of the mandatory library code for those who do programming contest
Visit my script to hunt UVA problem here:
http://felixhalim.net/uva/hunting/

Felix Halim
http://felixhalim.net/uva/hunting/

Felix Halim