Page 1 of 7

10679 - I Love Strings!!

Posted: Tue Jun 29, 2004 3:12 am
by erdos
I was trying to solve this problem with the naive solution (using strstr).
It gives TLE.
I've created a huge test file and the naive solution solves that huge test file very fast.
So... has the judge a lot of test files ?
I've few motivation to implement a better algorithm when I can't see the naive is slow.

By the way, if strstr is well implemented, a better solution is hard to implement considering there are only 100 queries in a single string, isn't it ?

Re: 10679

Posted: Tue Jun 29, 2004 3:44 am
by rotoZOOM
All naive solution have complexity O(n*m) in worst case,
where n - lenght of base string, and m - length of substring.
What for this problem n=100000 and m=1000, O(n*m)=O(100 000 000) multiple it by number of queries and by number of test cases you get
VERY BIG complexity.
Use O(n+m) algo. For example, Knuth-Morris-Pratt.

-best,
Pavel

Posted: Tue Jun 29, 2004 5:29 am
by cytmike
Is there no other ways to do so other than KMP?

Posted: Tue Jun 29, 2004 8:29 am
by rotoZOOM
cytmike wrote:Is there no other ways to do so other than KMP?
I suppose you may use hash function.
But you have to select very good hash-function.

Posted: Tue Jun 29, 2004 8:44 am
by Larry
This topic is well-researched. I ended up using Turbo Boyer-Moore. I thought it was slightly unforunate that it became a "guess the algorithm" contest, because my first submission was Ukkonen's algorithm with a suffix tree, but even that got TLE.. (during the contest..)

I'm curious as to what kind of algorithm people used..

Posted: Tue Jun 29, 2004 9:12 am
by anupam
Nice problem for KMP.
Thanks to Sajjad bhai :lol:

Posted: Tue Jun 29, 2004 9:24 am
by Adrian Kuegel
Another nice algorithm for this problem is Aho-Corasick. But the tricky thing is that the input seems to contain in some queries duplicate patterns, so the tree has to store the same pattern more than once.

Posted: Tue Jun 29, 2004 9:28 am
by anupam
Adrian, would you please describe the algorithm? it seems new to me. Or would be give me a link please??

Posted: Tue Jun 29, 2004 9:47 am
by Adrian Kuegel
All patterns are stored in one tree. Then failure links are created (that have same meaning as in KMP). And then for each node, the first failure link that points to a pattern is recorded as "output link". This is necessary because some pattern may be sub-pattern of a bigger pattern and would not be found otherwise.
Example:
2 patterns: string, in
the tree looks like this:

Code: Select all

______
|     |
s(1) i(2)
|     |
t(3) n(4)
|
r(5)
|
i(6)
|
n(7)
|
g(8)
I numbered the nodes in bfs order. This order is necessary to build the failure links.
Failure link of some node is failure link of parent + last read character,
or if this is not a node in the tree, it is the failure link of node pointed to by failure link of parent + last read character ... until either a node in the tree is found or the root is reached.
So in the example, there is a failure link from 6 to 2, from 7 to 4, and from all other nodes to the root.
When we are matching a text, it is sufficient to go through the text once.
For each character in the string, we try to follow an edge in the tree. If there is no such edge, follow a failure link (like in KMP). For each reached node, follow the output link in order not to miss a string to output.
Lets search in the string "strings".
We follow the edges in the tree 0->1, 1->3, 3->5, 5->6, 6->7
now we reach output link to 4 and see that we matched the pattern "in".
then we continue with 7 -> 8. Now we have matched the pattern "string".
and then we have to take failure link, because there is no edge labeled 's'.

I think you will also find some information in the internet if you use google.

Posted: Tue Jun 29, 2004 11:01 am
by abishek
i tried rabin karp by hashing the strings modulo some prime numbers.
what ever prime number i tried, i got TLE, and now i have the code Ac in 4.5 secs.
could some one tell me how to efficiently chose the prime in rabin karp

bye
abi

Posted: Tue Jun 29, 2004 4:20 pm
by mido
Uhh..our team solved this during the contest by, guess what, REVERSING the test strings, and then using strstr..the reversing process killed the tricks in the input, according to my teammate.

poor data set

Posted: Wed Jun 30, 2004 9:50 am
by sohel
I applied Boyer's Moore and got AC in 0.806 seconds..

.. but after reversing and using strstr took only 0.189 s

I am waiting for a rejudge eventhough I am currently ranked 2 in this problem.
:-?

book

Posted: Wed Jun 30, 2004 10:59 am
by pminkov
If you can, check out 'Algorithms on Trees and Sequences' -
Aho-Corasick is described there in detail.

Re: poor data set

Posted: Wed Jun 30, 2004 12:19 pm
by Observer
sohel wrote:I applied Boyer's Moore and got AC in 0.806 seconds..

.. but after reversing and using strstr took only 0.189 s

I am waiting for a rejudge eventhough I am currently ranked 2 in this problem.
:-?
Well, I don't think that the "reversing" technique is so bad after all. It is not easy to come up with this idea during contest time! One has to be really smart to think of this. :D

Anyway, it would be great if I could learn the advanced algorithms you guys mentioned above. It takes time, of course :wink:

odd

Posted: Wed Jun 30, 2004 2:17 pm
by sohel
Something very interesting :

After reversing the strings and using strstr() took .187 seconds...

.. later I thought of a better idea.. ie reverse then apply Boyer Moore.
.. but that gave TLE..

So it seems that for some cases strstr is better than Boyer Moore.

.. can any one tell me the reason why I get TLE with the modified method.
:-?