Page 2 of 3

Posted: Fri Mar 24, 2006 7:43 am
by Cho
andresw1 wrote:You have very nice idea but I have some doubts that this will give TLE. After you find the same hash as the patern you will need to do the slow compare to see if the matching is valid. For the test case that krijger mentioned (a lot of 'A's) this will give TLE. Do you do something else which I missed? Do you have AC?
You might misunderstand my idea. My run time is independent to the string content.
david wrote:
Cho wrote:Let c[j] be the character of the shorter string, I define its hash value as:
Summation over all valid i,j: c[j] * (3 ^ (i*100+j)) mod 2^32
(x^y denotes x to the power y instead of bitwise-XOR.)


Are you sure you didn't mean, for example,
c[j] * (32 ^ (i*100+j)) mod 2^32 ?
The hash function you wrote will fail to return different values for such simple strings as "ab" and "da" (assuming a is coded as 0), so I can't believe you got AC doing that without later checking if the match is exact (in which case you should get TLE for cases where both the matrix and the pattern are full of a's, for instance).

Yes, you are right. I should use 27 instead of 3. Forfunately, there are only 15 test cases. And my bug can only be caught if you decide one for it specifically.

david wrote:I tried to use as a hash funtion the summation over all valid i, j of
c[j] * (32 ^(i * pattern width + j)) mod 1000000007,
but I got WA after 9.7s, so even if this approach can be made to work I wonder how you managed to get it accepted in about 2.5 seconds.

I just tried your method and got AC in 7 secs. Make sure your arithmetic part is computed correctly.

Posted: Fri Mar 24, 2006 11:13 pm
by fmounir
Hello Cho,

How can you compute the hash values of all X*Y patterns in the large text in a constant time? Or at least in an O(n*m)?

Thanks.

Posted: Sat Mar 25, 2006 5:28 pm
by david
Cho wrote: I just tried your method and got AC in 7 secs. Make sure your arithmetic part is computed correctly.
You're right, that's enough to get AC. I mistakenly took X and Y to be the pattern width and height, respectively (this choice of letters in the problem statement may be a bit misleading).
Anyway, I still don't know how to make it run below the 5 seconds time limit set during the contest. Was hashing the intended solution to this problem?
fmounir wrote: How can you compute the hash values of all X*Y patterns in the large text in a constant time? Or at least in an O(n*m)?
This can be computed in O(n*m) by first storing the hash values of all X*1 patterns, and then computing the hash of an X*Y pattern based on the hash of the X*Y pattern just above it, along with the hash of the first line.

Posted: Sat Mar 25, 2006 5:35 pm
by andresw1
Hello,

So the solution is based on hashing the patern? What if we have the same hash for a different set of letters ot this is impossible?

Posted: Sat Mar 25, 2006 5:40 pm
by david
andresw1 wrote:Hello,
So the solution is based on hashing the patern? What if we have the same hash for a different set of letters ot this is impossible?
It is perfectly possible, but it is not likely that the judge test data will contain a case that will cause such a solution to fail.
The hashing solution would indeed be correct if we were not forced to take a modulus to make the result fit into an integer. Computing the hash value modulo two (or more) different primes would give you much better reliability, although this would most probably run beyond the time limit.

Posted: Sat Mar 25, 2006 5:42 pm
by andresw1
Very interesting...

So if we don't get AC with one prime, we change it... in the we will hit the right prime number which would give us AC:)))

Posted: Sat Mar 25, 2006 11:00 pm
by Emilio
David wrote:
This can be computed in O(n*m) by first storing the hash values of all X*1 patterns, and then computing the hash of an X*Y pattern based on the hash of the X*Y pattern just above it, along with the hash of the first line.
If your hash function depends of i and j, how do you compute your X*1 patterns in a constant time? I think for each column you must compute the first whole X*1 pattern and for the next ones in the same column in a constant time. Am I right? And if the answer is yes, how do you compute this?

Thanks, in advance.

Posted: Sat Mar 25, 2006 11:44 pm
by david
Emilio wrote: If your hash function depends of i and j, how do you compute yout X*1 patterns in a constant time? I think for each column you must compute the first whole X*1 pattern and for the next ones in the same column in a constant time. Am I right? And if the answer is yes, how do you compute this?

Thanks, in advance.
That's right. For each row, I first compute the hash of the first 1

Posted: Sun Mar 26, 2006 1:02 am
by andresw1
Hello,

David,
What do you mean by saying this is slow? Do you use double hashing or something more complex that makes it slow? I can imagine how it can bi done faster...

Posted: Sun Mar 26, 2006 1:15 am
by misof
andresw1 wrote:Hello,

David,
What do you mean by saying this is slow? Do you use double hashing or something more complex that makes it slow? I can imagine how it can bi done faster...
The modulo operation is slow. If you use a function resembling the one Cho described earlier, you can get the "modulo 2^32" for free -- you just have to compute everything in unsigned ints.

Aaaah, there's nothing like showing off some good old tricks of the trade :D

Posted: Sun Mar 26, 2006 1:31 am
by andresw1
misof,
As far as I know good hash is the one which uses nice primes... 2^32 is not one of them as far as I know. In addition Cho mentioned that you can use two different hashings and make a pair. This way you minimise the chance of wrong answer. Is it smart to use int and unsigned int as pair of mods?

Posted: Sun Mar 26, 2006 6:48 am
by Cho
andresw1 wrote:As far as I know good hash is the one which uses nice primes... 2^32 is not one of them as far as I know
The hash value (for one dimensional string) introduced by some text book is:
Summation over all k, c[k]*2^k mod p, where p is a prime.
But mine is:
Summation over all k, c[k]*p^k mod (2^32), where p is an odd number.
Both should work, I chose mine because I think (and indeed) it should be faster.
andresw1 wrote:In addition Cho mentioned that you can use two different hashings and make a pair.
It's david's idea, not mine. :)

Posted: Mon Mar 27, 2006 3:36 pm
by aminallam
[quote]That's right. For each row, I first compute the hash of the first 1

Posted: Wed Mar 29, 2006 12:58 am
by Emilio
Hi, I got AC.
I would must have read something about Karp-Rabin before asking anything.
Thanks anyway.

animallan you would must follow the same way than me, if you read something about Karp-Rabin you will understand it without troubles. In this moment I haven't the links, but google ...
Good luck!

Posted: Thu Mar 30, 2006 6:42 am
by aminallam
Thank you very much Mr. Emilio. I followed your advice and got AC.

For Mr. Misof:
I do not know how to use the trick of using unsigned int, and not using %. For example, consider this line:
hm2[j]=((hm2[i-1][j]+MOD-(hm1[i-x][j]*po[x-1])%MOD)*AS+hm1[j])%MOD;
How can I convert it using this trick? I have tried removing %MOD, and +MOD, but I got WA.