10829  LGap Substrings
Moderator: Board moderators

 New poster
 Posts: 15
 Joined: Thu Oct 16, 2003 7:43 pm
 Location: M
 Contact:
10829  LGap Substrings
Any hint for solving this problem ??
In the worst case (a string with 50,000 same character), the solution is greater than 500,000,000 LGAP. O.K. We can manage that case.
I think, even using KMP's algrithm is not possible count all LGAPs in time
Please, help.
In the worst case (a string with 50,000 same character), the solution is greater than 500,000,000 LGAP. O.K. We can manage that case.
I think, even using KMP's algrithm is not possible count all LGAPs in time
Please, help.

 New poster
 Posts: 15
 Joined: Thu Oct 16, 2003 7:43 pm
 Location: M
 Contact:
A gGap is defined as a string in the form UVU such that V = g.
For example: AALLAA is a 2Gap taking U = AA and V = LL.
furthermore is 4GAP because of U = A and V = ALLA and V = 4,
finally is not a 6GAP, because U can not be empty.
The problem consists in given a string S, and a positive integer G, count all substrings S' in S such that S' is a GGap.
For example: AALLAA is a 2Gap taking U = AA and V = LL.
furthermore is 4GAP because of U = A and V = ALLA and V = 4,
finally is not a 6GAP, because U can not be empty.
The problem consists in given a string S, and a positive integer G, count all substrings S' in S such that S' is a GGap.

 New poster
 Posts: 19
 Joined: Thu May 02, 2002 5:47 am
 Location: Zhongshan (Sun Yatsen) University

 Guru
 Posts: 724
 Joined: Wed Dec 19, 2001 2:00 am
 Location: Germany
I only solved this problem because for an bioinformatics course I had read a paper which described an algorithm how to find branching tandem repeats. I modified this algorithm for this problem. I know that the problemsetter had a different solution, he told me it is something with divide and conquer and extended kmp (knuth morris pratt algorithm) (but I don't know what he meant with extended).
Re: 10829  LGap Substrings
to Adrian: did you store all the (i,D) pairs  Lgap tandem repeats  explicitly? I'm going to use an O(nlogn) algo for branching tandem repeats
Re: 10829  LGap Substrings
Never mind, the question above is silly: we are asked to find the number of occurences, not the number of distinct substrings.
If there is ever a war between men and machines, it is easy to guess who will start it (c) Arthur Clarke
Re: 10829  LGap Substrings
I find all branching tandem repeats with gap 0 <= g <= L and length > Lg in O(L*n*log(n)).
Specifically, for my code finds pairs
but fails to find because the corresponding tandem repeat (g = 0) is not branching.
The question is: how to handle nonbranching repeats in O(nlogn)?
Specifically, for
Code: Select all
bbaabaaaaa$
Code: Select all
(6,10), (3,7), (8,10), (4,6), (7,9) and (2,6)
Code: Select all
(6,8),
The question is: how to handle nonbranching repeats in O(nlogn)?
Re: 10829  LGap Substrings
for the testcase the output size is
which amounts to O(n^2). Now, given we can find all branching tandem repeats in O(nlogn), how
to handle nonbranching tandem repeats?
Code: Select all
s = a^n
Code: Select all
(nL1) + (nL3) + (nL5) + ...,
to handle nonbranching tandem repeats?
If there is ever a war between men and machines, it is easy to guess who will start it (c) Arthur Clarke
Re: 10829  LGap Substrings
Well, with Adrian's help I managed to get AC with an O(GAPn(logn)^2) algo, and I can't see as yet the way of shortening this to O(nlogn).
My code is 580 lines long, pure C code.
Many thanks to Adrian and to Rujia!
My code is 580 lines long, pure C code.
Many thanks to Adrian and to Rujia!
If there is ever a war between men and machines, it is easy to guess who will start it (c) Arthur Clarke

 New poster
 Posts: 1
 Joined: Wed Oct 31, 2012 5:21 pm
Re: 10829  LGap Substrings
can anyone give me some hint for this problem?i don't know how to solve this? as i think ,may be suffix array,but i don't know how to deal with the gap?can anyone help me? thanks very much! my email:luyuncheng@sina.com