Suffix Arrays

Let's talk about algorithms!

Moderator: Board moderators

Post Reply
barlow
New poster
Posts: 2
Joined: Mon Nov 20, 2006 5:47 pm

Suffix Arrays

Post by barlow »

Hi, I want to build the suffix array of a given string, for instance if the string is abcd then the suffix array is abcd, bcd, cd, d. To build it I concatenate strings but it turns out that this is very slow operation. Can you help me to improve it ?

ajaysomani
New poster
Posts: 5
Joined: Sat Dec 30, 2006 2:16 pm
Location: Hyderabad
Contact:

Post by ajaysomani »

I think you don't actually need to store all the suffixes separately. What you can do is just store the indexes in a single dimentional array and whenever you have to access any particular suffix just use the position stored in the suffix array to do that. A very naive approach to build the suffix array might look something like this ..

Code: Select all

/* Input a string and build its suffix array */
#include<cstdio>
#include<cstring>
#include<cstdlib>
#define MAXLEN 100000
char s[MAXLEN+1];
int suffixArray[MAXLEN],n;
int compare(const void *p1,const void *p2){
        return strcmp(s+*(int *)p1,s+*(int *)p2);
}
int main(){
        scanf("%s",s);
        n = strlen(s);
        for(int i=0;i<n;i++){
                suffixArray[i] = i;         }
        qsort(suffixArray,n,sizeof(int),compare);   
        // Now you can use the suffix array in the application you want to use ..
        return 0;
}

People die young because god loves them so much, I am still on earth because there is a godess here who loves me even more.

fh
Learning poster
Posts: 59
Joined: Wed Jan 19, 2005 3:24 pm
Location: Jakarta
Contact:

Post by fh »

ajaysomani, in your code the MAXLEN is set to 100000 but in fact it can only handle up to 1000 (if the timelimit is 1 second) since your code will run in O ( N^2 log N ). You will need a more sophisticated way to build Suffix Array in O ( N log N ) to handle 100000 data.
Visit my script to hunt UVA problem here:
http://felix-halim.net/uva/hunting/
-----------------------
Felix Halim

ajaysomani
New poster
Posts: 5
Joined: Sat Dec 30, 2006 2:16 pm
Location: Hyderabad
Contact:

Post by ajaysomani »

Hi ,
Well yeah the order is kind of O(n^2 * log n ) but the point is that not every comparison while sorting takes O(N) time. So the time taken will be very less in practice ( most cases ). And moreover if the string is sufficiently dissimilar in its characters then this method would run in reasonable amount of time and can Handle upto 10000. [ I should have one less 0 in the declaration there :D ].

PS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
PS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
People die young because god loves them so much, I am still on earth because there is a godess here who loves me even more.

fh
Learning poster
Posts: 59
Joined: Wed Jan 19, 2005 3:24 pm
Location: Jakarta
Contact:

http://www.cs.helsinki.fi/u/tpkarkka/

Post by fh »

PS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
Then, the judge's test data must be lacking of a test case like: A^50000. You can try how long to build a suffix array using qsort above if the test case is AAAAAA.... (A^50000). It should take forever :P

PS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
What I meant was if you are going to solve it in O( N log N ) then you need a more sophisticated approach, not just qsort :wink:

Actually, there exist a linear O(N) suffix array construction by Juha K
Visit my script to hunt UVA problem here:
http://felix-halim.net/uva/hunting/
-----------------------
Felix Halim

ajaysomani
New poster
Posts: 5
Joined: Sat Dec 30, 2006 2:16 pm
Location: Hyderabad
Contact:

Post by ajaysomani »

Hmmmmmm ..... You are correct. I too think that I am lucky enough to get accepted. Anyway for normal contests I don't think that anybody tries linear time suffix array construction. [ I suppose it would be complicated :D ].
People die young because god loves them so much, I am still on earth because there is a godess here who loves me even more.

fh
Learning poster
Posts: 59
Joined: Wed Jan 19, 2005 3:24 pm
Location: Jakarta
Contact:

http://www.cs.helsinki.fi/u/tpkarkka/

Post by fh »

PS 1: I got accepted on http://www.spoj.pl/problems/SUBST1/ using the above code and the constraints are very high. [ 50000 ].
Then, the judge's test data must be lacking of a test case like: A^50000. You can try how long to build a suffix array using qsort above if the test case is AAAAAA.... (A^50000). It should take forever :P

PS 2: Could you please describe the O(n*logn ) approach to build the suffix arrays here.
What I meant was if you are going to solve it in O( N log N ) then you need a more sophisticated approach, not just qsort :wink:

Actually, there exist a linear O(N) suffix array construction by Juha Karkkainen and Peter Sanders. If you want to learn more about it, visit his site at:

http://www.cs.helsinki.fi/u/tpkarkka/

And find his publication about "Simple linear work suffix array construction." You can download his paper (in pdf) and the source code (in C).

In his paper he said that his skew algorithm is simpler than many best algorithms on Suffix Array (I find it also simpler :D).

(edit: phpBB seemed doesn't support unicode characters, that made my post truncated. The-non unicode character is removed and is now fixed)
Visit my script to hunt UVA problem here:
http://felix-halim.net/uva/hunting/
-----------------------
Felix Halim

fh
Learning poster
Posts: 59
Joined: Wed Jan 19, 2005 3:24 pm
Location: Jakarta
Contact:

Post by fh »

Anyway for normal contests I don't think that anybody tries linear time suffix array construction. [ I suppose it would be complicated ].
Not at all... His code is only 50 lines long. It's not something that cannot be done in normal programming contest :P Also, the idea is simple enough.

In fact, at timus online judge there is a problem that requires you to build a Suffix Array/Tree/Automaton in O( N ), otherwise TLE:

http://acm.timus.ru/problem.aspx?space=1&num=1517

You can see that Suffix Array becomes one of the mandatory library code for those who do programming contest :wink:
Visit my script to hunt UVA problem here:
http://felix-halim.net/uva/hunting/
-----------------------
Felix Halim

Post Reply

Return to “Algorithms”