Michael Goldshteyn wrote:
P.S. for those that still do not understand the problem statement, here is a simplified version:

Given strings A and B. B dominates A if BOTH conditions below are satisfied:

- Every distinct character in A also appears in B
- Any character that appears multiple times in A must appear at least that many times in B

So, a simple way of saying this is that a string B dominates a string A if A can be constructed using some or all of the characters from B.

Here are examples where the first string is dominated by the second:

abc bac (in fact, each dominates the other, so neither is a dominant string)

the problem statement says something different:

Given two strings s1 and s2, we say that s1 dominates s2 if the (multi)set of characters in s1 is a proper superset of the (multi)set of characters in s2.

Yeah, that would be the correct interpretation. And if Michael got AC with the other interpretation, it seems I wasn't careful enough when constructing the data.

Not only that, but I'm not sure if an implementation that checked for a truly proper superset would even pass the test. But, I presume the test data and test are constant, correct? Or will the test data be changed, now? In any case, the performance penalty of proper superset vs. superset is negligible, because all you need to do is to compare string lengths to distinguish a proper superset from a permutation of a string. That is, if a string you think dominates is longer than the string you are comparing with, it is definitely a proper superset. This is a trivial O(N) operation, especially considering that the number of permutations in the input, especially for longer string lengths which are more difficult to dominate, is most likely negligible.

I just give each char with a prim,eg: a(2) b(3) c(5)...
so you can change each string to a long long int ,and use % to check!
so without any cut you will get ac, and if you improve it, I think you will get a better time:)

fennec wrote:I just give each char with a prim,eg: a(2) b(3) c(5)...
so you can change each string to a long long int ,and use % to check!
so without any cut you will get ac, and if you improve it, I think you will get a better time:)

That sounds like a horribly inefficient way of doing this. Mod (%) is very expensive.

fennec wrote:I just give each char with a prim,eg: a(2) b(3) c(5)...
so you can change each string to a long long int ,and use % to check!

This is not (completely) correct, since the 26th prime is 101, and "zzzzzzzzzz" would be 101^10, which is a bit too large for 63 or even 64 bits. However, I split the characters into a..w (maximum 83^10) and x..z (maximum 5^10) and did the modulo checks for these two products. This got AC, but ran 9 seconds. Sorting by length reduced it to 3.3 seconds.

1.mat[i][j] stores the number of occurrences of "j+'a' " character in string i.
2.loop overall i
2a. if i not marked dominated
3. loop overall j
3a. if j not marked
if mat[i][k]!=mat[j][k]
if i dominates j
mark j dominated
if j dominated i
mark i dominated
break out of j loop
4 loop overall i
if i not marked as dominated print string i.

Do you do the inner check (that with mat[i][k]) for all k? i dominates j iff mat[i][k] >= mat[j][k] for all k from 0 to 25. Theoretically, there must also be at least one k with mat[i][k] > mat[j][k], but since all strings are different, there is no need to check this. Remember to skip string i in the inner loop. I assume you sort the strings, since otherwise you wouldn't get the sample output right.

I've solved it in a little different way. At first I have sorted all strings (index-sorting for speed) ascendingly on:
1) length
2) num of 'a'
3) num of 'b'
4) num of 'c'
etc...

In that order of priority.

Than recursively I created a forest based on this sorting. First level nodes united strings with same length, then for each same-length subset of strings I grouped them by same amount of character 'a' on the 2nd level, on the 3rd level strings of same length and same amount for character 'a' were grouped on amount of character 'b', and so on until some segment representing subset ends up with the only string (dupes were eliminated before building the forest).

I guess this way it creates graph like O(N). Per-character graph can be blown off with a suitable test. This way it can't. 1024^2 nodes were enough, perhaps much less. I think it's something like 26*N*C.

Then, to test a string for being dominated, I check all root nodes with L>l, then for each sub-level of those nodes with NUM('a') >= num('a'), etc... where big letters stand for node properties, and small letters stand for string properties. Whenever I faced terminating segment (that is, referring to just one string), I checked the rest of it. I.e. if we end at level corresponding to letter 'e', do check for 'e'...'z' and see if it's dominated. Later this was optimized using the fact length-is-at-most-10.

This way it worked at about 4.6 sec. With optimizations based on strings length it worked under contests' 2 sec (namely 1.699 sec).