10679 - I Love Strings!!

anupam · Post by **anupam** » Wed Jun 30, 2004 6:30 pm

Thanks Adrian, got it. Found the book, named here, also.

Revenger · Post by **Revenger** » Wed Jun 30, 2004 7:05 pm

Can anyone explain me why my code (it was accepted 3 days later) gets TLE

. I am using KMP

[pascal]{$A-,B-,D+,E+,F-,G-,I-,L-,N-,O+,P-,Q-,R-,S-,T-,V-,X+}
const maxlen = 110000;

type my_string = record
len : longint;
sa : array[1..maxlen]of char;
end;

procedure read_str(var a : my_string);
var ch : char;
begin
a.len := 0;
while true do begin
read(ch);
if (ch = #10) or (ch = #13) then break;
a.len := a.len + 1;
a.sa[a.len] := ch;
end;
end;

var t, ti, k, i, j, q : longint;
inps, tst : my_string;
f : array[1..2000]of longint;

begin
readln(t);
for ti := 1 to t do begin
read_str(inps);
readln(k);
for i := 1 to k do begin
read_str(tst);
if tst.len = 0 then begin
writeln('y');
continue;
end;
f[1] := 0;
j := 0;
for q := 2 to tst.len do begin
while (j > 0) and (tst.sa[j + 1] <> tst.sa[q]) do j := f[j];
if tst.sa[j + 1] = tst.sa[q] then j := j + 1;
f[q] := j;
end;
q := 0;
for j := 1 to inps.len do begin
while (q > 0) and (tst.sa[q + 1] <> inps.sa[j]) do q := f[q];
if tst.sa[q + 1] = inps.sa[j] then q := q + 1;
if q = tst.len then break;
end;
if q = tst.len then writeln('y') else writeln('n');
end;
end;
end.[/pascal]

Adrian Kuegel · Post by **Adrian Kuegel** » Wed Jun 30, 2004 8:45 pm

It seems the problem limits were changed. Now there are up to 1000 queries.
And with KMP you have to go through the text now up to 1000 times, and that is too much.
So it seems that now you have to use Aho-Corasick (with this algorithm it is sufficient to go through the text once for a given set of patterns).

Observer · Post by **Observer** » Thu Jul 01, 2004 7:22 am

Adrian Kuegel wrote:It seems the problem limits were changed. Now there are up to 1000 queries.
And with KMP you have to go through the text now up to 1000 times, and that is too much.
So it seems that now you have to use Aho-Corasick (with this algorithm it is sufficient to go through the text once for a given set of patterns).

Now I think that this question is not very meaningful... It's like nearly impossible to be solved without a good algorithm book at home...

Md. Sajjad Hossain · Post by **Md. Sajjad Hossain** » Thu Jul 01, 2004 7:32 am

I have solved the problem with famous Ukkonen's (I hope the name is correct

) algorithm which is also known as suffix tree method. The original data was not very strong to differentiate between KMP and mine

Also it had some leaks(

)....Now they are fixed.....
--sajjad

htl · Post by **htl** » Thu Jul 01, 2004 11:18 am

Could someone provide the resources on the internet about the two algorithms (aho and Ukkonen)? I tried hard and found few of them.

ulin · Post by **ulin** » Thu Jul 01, 2004 6:16 pm

This links should be helpful

http://www.cs.uku.fi/~kilpelai/BSA04/le ... rint04.pdf
www-sr.informatik.uni-tuebingen.de/~buehler/AC/AC1.html

first one is about Aho-Corasic
second one is simulation of this algo. It helped me to understand.

Revenger · Post by **Revenger** » Fri Jul 02, 2004 9:20 am

Thanks, ulin! It was really helpful for me - I've got Accepted

Revenger · Post by **Revenger** » Fri Jul 02, 2004 10:24 am

Let's talk about 10679's tests. I think that they are very easy.

First of all, if in Aho-Corasick you compute failure function in dfs order you got accepted. That's very strage, because almost every randomized test check this.

Aho-Corasick have complexity O(m + z), where z is the number of pettern occurences. So, there are many tests where z = O(km). In such test cases Aho-Corasick algo is not better than KMP. But AC algo can be modifed in a such way, that it would have complexity O(m + k) for a current problem.

What's about memory? There is not any good test for checking memory usage. It is very hard to run under 32 MB. If you were not belive me try this test

[pascal]var i, j : longint;

begin
assign(output, '10679.in');
rewrite(output);

randseed := 1017358;

writeln(1);
for i := 1 to 100000 do write(chr(ord('a') + random(26))); writeln;
writeln(1000);
for i := 1 to 1000 do begin
write(chr(ord('a') + ((i div 676) mod 26)));
write(chr(ord('a') + ((i div 26) mod 26)));
write(chr(ord('a') + (i mod 26)));
for j := 4 to 1000 do write(chr(ord('a') + random(26))); writeln;
end;
end.[/pascal]

My program needs about 200 MB to solve this test. What about your program?

windows2k · Post by **windows2k** » Fri Jul 02, 2004 4:12 pm

Adrian Kuegel wrote:It seems the problem limits were changed. Now there are up to 1000 queries.
And with KMP you have to go through the text now up to 1000 times, and that is too much.
So it seems that now you have to use Aho-Corasick (with this algorithm it is sufficient to go through the text once for a given set of patterns).

Hello, Adrian Kuegel .
I have accepted the problem. Use the same algorithm with yours.
But it is much slower than your program.
How could you speed your program?
Thx

hoangmit · Post by **hoangmit** » Sun Jul 04, 2004 6:45 am

I've got troubles with reading input in the online judge.

Can anybody tell me why ?

[cpp]#include <iostream>
#include <cstdio>
#include <cassert>
#include <cctype>
#include <cstring>

using namespace std;

char text[1000002];
char pattern[1002];

int main()
{
//freopen("C:\\10679.in", "r", stdin);

int i, j, nTest, nQuery;

scanf("%d\n", &nTest);
for (i = 0; i < nTest; i++)
{
gets(text);
scanf("%d\n", &nQuery);
for (j = 0; j < nQuery; j++)
{
gets(pattern);
if (strlen(pattern) > 0)
assert(!isdigit(pattern[0])); // Must be wrong here

}
}
return 0;
}
[/cpp]

pminkov · Post by **pminkov** » Sun Jul 04, 2004 11:56 am

That's because someone introduced characters different than
alphabetic chars and digits (which are not allowed).
I supposed that this was messed up when resubmitting my
problem, but I was too tired to check what was wrong this
time. At least someone can fix the problem statement ...

Generally it's very frustrating to accept during a contest,
and then not being able to accept the problem in the online
judge

, and this adds another degree of frustration

Adrian Kuegel · Post by **Adrian Kuegel** » Sun Jul 04, 2004 4:39 pm

Aho-Corasick have complexity O(m + z), where z is the number of pettern occurences. So, there are many tests where z = O(km). In such test cases Aho-Corasick algo is not better than KMP. But AC algo can be modifed in a such way, that it would have complexity O(m + k) for a current problem.

Yes, and I modified the algorithm for this problem so it is O(m+k).

What's about memory? There is not any good test for checking memory usage. It is very hard to run under 32 MB. If you were not belive me try this test

Of course I believe you, and my program also needs a lot of memory (more than 32 MB). Theoretically I need up to q*1000*55*4 bytes of memory.

How could you speed your program?

Because only lowercase and uppercase letters are allowed, you need only 52 pointers to childs in your tree. This is the only speedup I used.

That's because someone introduced characters different than
alphabetic chars and digits (which are not allowed).

Yes, I also found out that there exists other characters in the input.
My program handles all invalid input characters as if they are 'a' (of course not intentionally, only because it assumes that only valid characters occur).
As the problemsetter probably reads this thread, I hope the judge input will be fixed soon.

CrazyTerabyte · Post by **CrazyTerabyte** » Wed Jul 28, 2004 1:16 am

Hey, the problem says that time limit is 4 seconds, but I see at ranklist some people with more than 4 seconds (about 13 people, with, at most, 8.6 seconds).

Can someone explain why this?

http://acm.uva.es/cgi-bin/OnlineJudge?P ... :10679:200

Andrey Mokhov · Post by **Andrey Mokhov** » Wed Jul 28, 2004 8:09 am

Hi!

Time limits in problems means the time limits that were in law during contest. Strangly enough after contest all the problems get default
time limit that is 10 seconds. Quite bad, I guess. I won't get AC on this problem if TL were 4 seconds and would search for a better algo

Maybe in this particular problem the changing of time limit from 4 to 10 seconds can be explained as the data set and problem statement has changed after contest. But they at least should change the title stating TL... but you know someone has to do it... and there are not lots of people who is doing such things - they can't do everything in a moment.

Buy.
Andrey.

OJ Board

10679 - I Love Strings!!

Bad test data

Time limit in 10679?