Page 1 of 1

Suffix Arrays

Posted: Mon Nov 20, 2006 5:50 pm
by barlow
Hi, I want to build the suffix array of a given string, for instance if the string is abcd then the suffix array is abcd, bcd, cd, d. To build it I concatenate strings but it turns out that this is very slow operation. Can you help me to improve it ?

Posted: Sat Dec 30, 2006 5:42 pm
by ajaysomani
I think you don't actually need to store all the suffixes separately. What you can do is just store the indexes in a single dimentional array and whenever you have to access any particular suffix just use the position stored in the suffix array to do that. A very naive approach to build the suffix array might look something like this ..

Code: Select all

/* Input a string and build its suffix array */
#include<cstdio>
#include<cstring>
#include<cstdlib>
#define MAXLEN 100000
char s[MAXLEN+1];
int suffixArray[MAXLEN],n;
int compare(const void *p1,const void *p2){
        return strcmp(s+*(int *)p1,s+*(int *)p2);
}
int main(){
        scanf("%s",s);
        n = strlen(s);
        for(int i=0;i<n;i++){
                suffixArray[i] = i;         }
        qsort(suffixArray,n,sizeof(int),compare);   
        // Now you can use the suffix array in the application you want to use ..
        return 0;
}


Posted: Sun Dec 31, 2006 6:22 pm
by fh
ajaysomani, in your code the MAXLEN is set to 100000 but in fact it can only handle up to 1000 (if the timelimit is 1 second) since your code will run in O ( N^2 log N ). You will need a more sophisticated way to build Suffix Array in O ( N log N ) to handle 100000 data.

Posted: Mon Jan 01, 2007 11:21 am
by ajaysomani
Hi ,
Well yeah the order is kind of O(n^2 * log n ) but the point is that not every comparison while sorting takes O(N) time. So the time taken will be very less in practice ( most cases ). And moreover if the string is sufficiently dissimilar in its characters then this method would run in reasonable amount of time and can Handle upto 10000. [ I should have one less 0 in the declaration there :D ].

PS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
PS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.

http://www.cs.helsinki.fi/u/tpkarkka/

Posted: Mon Jan 01, 2007 5:34 pm
by fh
PS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
Then, the judge's test data must be lacking of a test case like: A^50000. You can try how long to build a suffix array using qsort above if the test case is AAAAAA.... (A^50000). It should take forever :P

PS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
What I meant was if you are going to solve it in O( N log N ) then you need a more sophisticated approach, not just qsort :wink:

Actually, there exist a linear O(N) suffix array construction by Juha K

Posted: Mon Jan 01, 2007 6:51 pm
by ajaysomani
Hmmmmmm ..... You are correct. I too think that I am lucky enough to get accepted. Anyway for normal contests I don't think that anybody tries linear time suffix array construction. [ I suppose it would be complicated :D ].

http://www.cs.helsinki.fi/u/tpkarkka/

Posted: Mon Jan 01, 2007 7:25 pm
by fh
PS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
Then, the judge's test data must be lacking of a test case like: A^50000. You can try how long to build a suffix array using qsort above if the test case is AAAAAA.... (A^50000). It should take forever :P

PS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
What I meant was if you are going to solve it in O( N log N ) then you need a more sophisticated approach, not just qsort :wink:

Actually, there exist a linear O(N) suffix array construction by Juha Karkkainen and Peter Sanders. If you want to learn more about it, visit his site at:

http://www.cs.helsinki.fi/u/tpkarkka/

And find his publication about "Simple linear work suffix array construction." You can download his paper (in pdf) and the source code (in C).

In his paper he said that his skew algorithm is simpler than many best algorithms on Suffix Array (I find it also simpler :D).

(edit: phpBB seemed doesn't support unicode characters, that made my post truncated. The-non unicode character is removed and is now fixed)

Posted: Mon Jan 01, 2007 7:38 pm
by fh
Anyway for normal contests I don't think that anybody tries linear time suffix array construction. [ I suppose it would be complicated ].
Not at all... His code is only 50 lines long. It's not something that cannot be done in normal programming contest :P Also, the idea is simple enough.

In fact, at timus online judge there is a problem that requires you to build a Suffix Array/Tree/Automaton in O( N ), otherwise TLE:

http://acm.timus.ru/problem.aspx?space=1&num=1517

You can see that Suffix Array becomes one of the mandatory library code for those who do programming contest :wink: