## 283 - Compress

Moderator: Board moderators

Cloud
New poster
Posts: 10
Joined: Sat Dec 29, 2001 2:00 am
Contact:
give me some tricky data pls , i dont know why my program got WA

Cloud
New poster
Posts: 10
Joined: Sat Dec 29, 2001 2:00 am
Contact:
I 've just got AC

hlchan
New poster
Posts: 7
Joined: Sat May 25, 2002 8:15 am

### Poor question!

The test input contains charactors besides 'A'..'Z', 'a'...'z', '.', ' ', ',', '-', '\$'.

cytse
Learning poster
Posts: 67
Joined: Mon Sep 16, 2002 2:47 pm
Location: Hong Kong
Contact:

### 283 - Compress

Can anyone explain why the solution for sample input is 335 and 167?

Per
A great helper
Posts: 429
Joined: Fri Nov 29, 2002 11:27 pm
Location: Sweden
Doing only the second case as it's smaller, 167 is what you get when using, for instance, this encoding scheme:
[space] -> 000
b -> 001
e -> 010
h -> 011
i -> 100
o -> 101
t -> 110
a -> 111000
n -> 111001
q -> 111010
r -> 111011
s -> 111100
T -> 111101
u -> 111110
\$ -> 1111110
. -> 11111110
, -> 11111111
And I think you'll find a hard time trying to do better using the encoding method of the problem.

Even though I've finally managed to get the right output for the sample data, I still get WA. Could anyone of the few people who are actually accepted on this one give the corresponding output for the following input:
2
3
\$
\$
\$
4
ABCABC123321123ABC\$
ABC
ABC
123\$

Learning poster
Posts: 73
Joined: Mon Oct 14, 2002 7:15 am
Location: United States
Would someone mind explaining the method for determining the encoding? The question seems quite vague and ambiguous .e.g.
Suppose you want to give some characters a code length of 3.
Then in next paragraph:
You choose to give the first three characters a code with length 2 and the rest an equal length.
Based upon the statement of the problem, one would think a Huffman Encoding is appropriate, but if I am not mistaken, a Huffman Encoding gives a total length which is shorter than the "minimum" for the sample inputs. So what is the process for determining the "minimum" encoding that it asks for?

Thanks.

Guru
Posts: 724
Joined: Wed Dec 19, 2001 2:00 am
Location: Germany
Per: The output for your input is:
3
86

The method is the following:
You can use all bit lengths between 1 and 8, and you can choose for example to use bit length 2 for 3 characters and bit length (2+4) for up to 16 characters.
I use backtracking with memoization to calculate the minimum.

Per
A great helper
Posts: 429
Joined: Fri Nov 29, 2002 11:27 pm
Location: Sweden
Ah, I got accepted. All strange test cases considered, it still came down to a silly array initialization bug.

I did something similar; a simple O(n^2) (which could triviallly be turned into O(n log n)) thing using dynamic programming (where n is the number of distinct characters in the text).

cytse
Learning poster
Posts: 67
Joined: Mon Sep 16, 2002 2:47 pm
Location: Hong Kong
Contact:
Thanks all. I misunderstood the problem.

cplusplus
New poster
Posts: 13
Joined: Fri Feb 07, 2003 10:20 pm

as the question describe
the first line of the test file will be an number N
and there will be N test cases....
the frist line of every case is a number m,then followed by n lines of data
then here is my code, I just read the in the data without doing anything
..........
int N,nLine,length;
char line[255];
cin>>N;
while(N--)
{
cin>>nLine;
while(nLine--)
{
length=0;
while(length==0)
{
cin.getline(line,255);
length=strlen(line);
}
}
}

but it get "time limit exceeded"....about 30 times.....
why?? I have tried many different ways.....but it dosen't work...
Thank you very much

cytse
Learning poster
Posts: 67
Joined: Mon Sep 16, 2002 2:47 pm
Location: Hong Kong
Contact:
Some input lines are quite long. Your character array is not large enough to get the whole line.

cplusplus
New poster
Posts: 13
Joined: Fri Feb 07, 2003 10:20 pm
cytse wrote:Some input lines are quite long. Your character array is not large enough to get the whole line.
ya!!
Thank you very much!!!!!
That's it!!
I assume there are most 1024 chars per line,so.....>"<
I have got AC now,finally

Thank you very much!!

epsilon0
Experienced poster
Posts: 112
Joined: Tue Nov 12, 2002 11:15 pm
Location: Paris, France.
this problem is STUPID

1] the text is very unclear about the encoding scheme. so i assume it is an huffman encoding (= optimal bitlength) with an extra condition, that all codes must have 8 bit maximum. so i assumed it was a "modified huffman".

2] i'm sure the sample output is wrong if my assumption is not.

3] i exhibit a 155 bit encoding scheme for the second sample, demonstrating my point:

To be or not to be, that is the question.\$

'T' (1 occur) -> 111000
'o' (5 occur) -> 011
' ' (9 occur) -> 00
'b' (2 occur) -> 1001
'e' (4 occur) -> 1000
'r' (1 occur) -> 111001
'n' (2 occur) -> 1010
't' (6 occur) -> 010
',' (1 occur) -> 111010
'h' (2 occur) -> 1011
'a' (1 occur) -> 111011
'i' (2 occur) -> 1100
's' (2 occur) -> 1101
'q' (1 occur) -> 111100
'u' (1 occur) -> 111101
'.' (1 occur) -> 111110
'\$' (1 occur) -> 111111

total is 9 * 2 + 11 * 3 + 14 * 4 + 8 * 6
= 18 + 33 + 56 + 48
= 155 bits.

this is a correct huffman encoding and all bitlengths are between 1 and 8.

PS: please dont formalise over an eventual error in the above table, as i partially computed it by hand. the point is that such an encoding exists that make it 155 bits, using:

1 code of length 2 bit
2 codes of length 3 bit
6 codes of length 4 bit
8 codes of length 6 bit

of course there are lots of ways to then build an encoding scheme with the same bitlength of 155 bit (just allocate the codes differently, swap the codes of same length, etc...)

so WTF is wrong with that problem? HELP
We never perform a computation ourselves, we just hitch a ride on the great Computation that is going on already. --Tomasso Toffoli

Caesum
Experienced poster
Posts: 225
Joined: Fri May 03, 2002 12:14 am
Location: UK
Contact:
epsilon,

it is not huffmann. you can code so many characters, for example using 4 bits=16 characters or you can use *one* of those characters to define an extension=15 characters plus one extension code.

junbin
Experienced poster
Posts: 174
Joined: Mon Dec 08, 2003 10:41 am
Per wrote: Even though I've finally managed to get the right output for the sample data, I still get WA. Could anyone of the few people who are actually accepted on this one give the corresponding output for the following input:
2
3
\$
\$
\$
4
ABCABC123321123ABC\$
ABC
ABC
123\$
I don't think there's such output...
1) All lines must end with a \$
2) numbers (1, 2, 3) are not valid characters.

Therefore, any answers you get for this is probably undefined... (my AC program gives a different answer than the one given below)