How can I implement huffman encoding & solve this problem?
I know about huffman encoding theoretically. But I encounter with some
confusing in real implementation...
If you experienced implemeting huffman encoding or solving this
problem , then please tell me about your know-how.
Thanks.
240 - Variable Radix Huffman Encoding
Moderator: Board moderators
hello,
i just solved this problem. (i got PE).
my implementation has the following structures:
an encoding tree that is stored as an array of fathers. each node in the tree has additional fields that include: letter, weigth, position relative to siblings. these informations will allow building the tree incrementally, and the position field will then allow to reconstruct the code associated with each leaf. (by climbing from leaf to root.)
a heap. read about heaps if you need. it is very helpful in this problem. the heap keeps a structure that allow you to fetch the "smallest" node at any given time. (it is always on top of the heap). functions to insert/delete nodes from heap are fast (log n). of course you just need to put a pointer (or index) in the heap. the actual nodes data doesn't need to be moved.
i hope this can help you.
i can give you my source code in C if you have trouble with it and are interested to see a way of solving this problem.
let me know if you get AC.
i just solved this problem. (i got PE).
my implementation has the following structures:
an encoding tree that is stored as an array of fathers. each node in the tree has additional fields that include: letter, weigth, position relative to siblings. these informations will allow building the tree incrementally, and the position field will then allow to reconstruct the code associated with each leaf. (by climbing from leaf to root.)
a heap. read about heaps if you need. it is very helpful in this problem. the heap keeps a structure that allow you to fetch the "smallest" node at any given time. (it is always on top of the heap). functions to insert/delete nodes from heap are fast (log n). of course you just need to put a pointer (or index) in the heap. the actual nodes data doesn't need to be moved.
i hope this can help you.
i can give you my source code in C if you have trouble with it and are interested to see a way of solving this problem.
let me know if you get AC.
We never perform a computation ourselves, we just hitch a ride on the great Computation that is going on already. --Tomasso Toffoli
-
- Experienced poster
- Posts: 106
- Joined: Sun Feb 17, 2002 2:00 am
- Location: Seoul, South Korea
- Contact:
Hello,
Thank you for your kindly answer.
If you can, please send to me your accepted source code.
My e - mail address is soyoja@nownuri.net.
I believe that it must very helpful to me.
Thank you & God bleesing you ~ : )
Thank you for your kindly answer.
If you can, please send to me your accepted source code.
My e - mail address is soyoja@nownuri.net.
I believe that it must very helpful to me.
Thank you & God bleesing you ~ : )
-
- Learning poster
- Posts: 53
- Joined: Sat May 01, 2004 9:31 pm
- Contact:
240 - Huffman - Segfaulting - asserts up the wazoo!
Hi all,
I'm having a bear of a time with problem 240. I'm pretty sure it works okay, but I keep getting segfaults whenever I submit. Here is my (ugly, messy) code -- anyone have any insights as to what could be causing me grief? On my computer (gcc 3.2.3) it works great. Here is my testing input file:
and here is my source:
[cpp]
#include <iostream>
#include <string>
#include <sstream>
#include <stdio.h>
#include <assert.h>
#define ALPHAMAX 50
#define KIDMAX 10
using namespace std;
typedef struct lettergroup;
/* GLOBALS */
int r, n, t;
lettergroup *alpha[ALPHAMAX];
string spacer = "Set";
int set = 0;
typedef struct lettergroup
{
int freq;
bool flagged;
string huffcode;
lettergroup *c[KIDMAX], *papa;
/* Common init stuff (um, could I please chain constructors?! */
void init_stuff(void)
{
for (int j = 0; j < r; j++)
c[j] = 0;
papa = 0;
flagged = false;
huffcode = "";
}
/* Constructor for leaf nodes; i = frequency */
lettergroup(int i)
{
freq = i;
init_stuff();
}
/* Constructor for intermediate nodes */
lettergroup(void)
{
init_stuff();
freq = 0;
}
void flag(void)
{
if (c[0])
{
for (int i = 0; i < r; i++)
{
assert(c);
c->flag();
}
return;
}
flagged = true;
}
/* Finds the highest-up ancestor group of this group */
lettergroup *first_parent()
{
if (papa)
return papa->first_parent();
return this;
}
~lettergroup(void)
{
int i;
for (i = 0; i < KIDMAX; i++)
{
if (c)
delete c;
c = 0;
}
}
};
/* Reset the alphabet for a new case */
void reset(void)
{
if (alpha[0])
{
lettergroup *big_top = alpha[0]->first_parent();
delete big_top;
}
int i;
for (i = 0; i < ALPHAMAX; i++)
alpha = 0;
}
/* Huffman-encodes a subtree */
void populate(lettergroup *a, string code)
{
assert(a);
if (!a->c[0])
a->huffcode = code;
else
{
for (int i = 0; i < r; i++)
{
assert(a->c);
populate(a->c, code + (char)(i + '0'));
}
}
}
/* Does the Huffman Encoding for each letter */
void huffman(void)
{
lettergroup *top = alpha[0]->first_parent();
assert(top);
populate(top, "");
}
/* Groups the letters into a tree structure */
void treegen(void)
{
int pass_count = (t - r) / (r - 1) + 1;
assert((t - r) % (r - 1) == 0);
while (pass_count--)
{
assert(alpha[0]);
lettergroup *dad = alpha[0]->first_parent(), *newgroup;
int i, j, best = 0, lowest_score = dad->freq;
assert(lowest_score);
/* new pass, nothing has been used yet */
for (i = 0; i < t; i++)
{
assert(alpha);
alpha->flagged = false;
}
/* first pass, find the lowest */
for (i = 1; i < t; i++)
{
dad = alpha[i]->first_parent();
if (dad->freq < lowest_score)
{
lowest_score = dad->freq;
best = i;
}
}
/* make the new group */
dad = alpha[best]->first_parent();
assert(dad);
dad->flag(); // flag the whole group as used
newgroup = dad->papa = new lettergroup();
newgroup->c[0] = dad;
newgroup->freq = dad->freq;
/* now do all remaining passes and add them in-order to the group */
for (i = 1; i < r; i++)
{
for (j = 0; j < t; j++)
{
assert(alpha[j]);
if (!alpha[j]->flagged)
{
best = j;
dad = alpha[j]->first_parent();
assert(dad);
lowest_score = dad->freq;
break;
}
}
while (j < t)
{
assert(alpha[j]);
if (!alpha[j]->flagged)
{
dad = alpha[j]->first_parent();
assert(dad);
if (dad->freq < lowest_score)
{
lowest_score = dad->freq;
best = j;
}
}
j++;
}
dad = alpha[best]->first_parent();
assert(dad);
dad->flag();
dad->papa = newgroup;
newgroup->c[i] = dad;
newgroup->freq += dad->freq;
} //for
} //while
}
/* Average length of set (round to 2 dec. places) */
double avelen(void)
{
double dReturn = 0.0;
int tot_freq = 0;
for (int i = 0; i < n; i++)
{
assert(alpha[i]);
tot_freq += alpha[i]->freq;
dReturn += double(alpha[i]->freq * (alpha[i]->huffcode).length());
}
assert(tot_freq > 0);
return dReturn / tot_freq;
}
/* Output the results */
void output(void)
{
int i;
printf("%s %d; average length %4.2f\n", spacer.data(), set, avelen());
for (i = 0; i < n; i++)
cout << " " << char(i + 'A') << ": " << alpha[i]->huffcode << endl;
}
int main(int argc, char **argv)
{
cin >> r;
string line;
while (r > 0)
{
set++;
cin >> n;
t = n; // t is total number of symbols, including fictitious ones
assert(r > 1);
if ((n - 1) % (r - 1)) // if it won't work out evenly, add fake ones
t += r - 1 - ((n - r) % (r - 1));
assert(r <= 10 && r >= 2 && n >= 2 && n <= 26 && t >= n && t <= ALPHAMAX);
reset();
getline(cin, line);
istringstream iss(line);
int i = 0, q;
while (iss >> q)
{
assert(q <= 999 && q >= 1);
alpha[i++] = new lettergroup(q);
}
if (n < t)
while (i <= t)
alpha[i++] = new lettergroup(0);
treegen();
huffman();
output();
cin >> r;
spacer = "\nSet";
}
return 0;
}
[/cpp]
I'm having a bear of a time with problem 240. I'm pretty sure it works okay, but I keep getting segfaults whenever I submit. Here is my (ugly, messy) code -- anyone have any insights as to what could be causing me grief? On my computer (gcc 3.2.3) it works great. Here is my testing input file:
Code: Select all
2 5 5 10 20 25 40
2 5 4 2 2 1 1
3 7 20 5 8 5 12 6 9
4 6 10 23 18 25 9 12
10 26 4 28 28 17 18 18 3 81 785 28 198 39 19 284 200 29 3 3 2 29 478 209 28 17 28 394
2 26 4 28 28 17 18 18 3 81 785 28 198 39 19 284 200 29 3 3 2 29 478 209 28 17 28 394
3 2 21 12
3 4 5 7 8 15
0
[cpp]
#include <iostream>
#include <string>
#include <sstream>
#include <stdio.h>
#include <assert.h>
#define ALPHAMAX 50
#define KIDMAX 10
using namespace std;
typedef struct lettergroup;
/* GLOBALS */
int r, n, t;
lettergroup *alpha[ALPHAMAX];
string spacer = "Set";
int set = 0;
typedef struct lettergroup
{
int freq;
bool flagged;
string huffcode;
lettergroup *c[KIDMAX], *papa;
/* Common init stuff (um, could I please chain constructors?! */
void init_stuff(void)
{
for (int j = 0; j < r; j++)
c[j] = 0;
papa = 0;
flagged = false;
huffcode = "";
}
/* Constructor for leaf nodes; i = frequency */
lettergroup(int i)
{
freq = i;
init_stuff();
}
/* Constructor for intermediate nodes */
lettergroup(void)
{
init_stuff();
freq = 0;
}
void flag(void)
{
if (c[0])
{
for (int i = 0; i < r; i++)
{
assert(c);
c->flag();
}
return;
}
flagged = true;
}
/* Finds the highest-up ancestor group of this group */
lettergroup *first_parent()
{
if (papa)
return papa->first_parent();
return this;
}
~lettergroup(void)
{
int i;
for (i = 0; i < KIDMAX; i++)
{
if (c)
delete c;
c = 0;
}
}
};
/* Reset the alphabet for a new case */
void reset(void)
{
if (alpha[0])
{
lettergroup *big_top = alpha[0]->first_parent();
delete big_top;
}
int i;
for (i = 0; i < ALPHAMAX; i++)
alpha = 0;
}
/* Huffman-encodes a subtree */
void populate(lettergroup *a, string code)
{
assert(a);
if (!a->c[0])
a->huffcode = code;
else
{
for (int i = 0; i < r; i++)
{
assert(a->c);
populate(a->c, code + (char)(i + '0'));
}
}
}
/* Does the Huffman Encoding for each letter */
void huffman(void)
{
lettergroup *top = alpha[0]->first_parent();
assert(top);
populate(top, "");
}
/* Groups the letters into a tree structure */
void treegen(void)
{
int pass_count = (t - r) / (r - 1) + 1;
assert((t - r) % (r - 1) == 0);
while (pass_count--)
{
assert(alpha[0]);
lettergroup *dad = alpha[0]->first_parent(), *newgroup;
int i, j, best = 0, lowest_score = dad->freq;
assert(lowest_score);
/* new pass, nothing has been used yet */
for (i = 0; i < t; i++)
{
assert(alpha);
alpha->flagged = false;
}
/* first pass, find the lowest */
for (i = 1; i < t; i++)
{
dad = alpha[i]->first_parent();
if (dad->freq < lowest_score)
{
lowest_score = dad->freq;
best = i;
}
}
/* make the new group */
dad = alpha[best]->first_parent();
assert(dad);
dad->flag(); // flag the whole group as used
newgroup = dad->papa = new lettergroup();
newgroup->c[0] = dad;
newgroup->freq = dad->freq;
/* now do all remaining passes and add them in-order to the group */
for (i = 1; i < r; i++)
{
for (j = 0; j < t; j++)
{
assert(alpha[j]);
if (!alpha[j]->flagged)
{
best = j;
dad = alpha[j]->first_parent();
assert(dad);
lowest_score = dad->freq;
break;
}
}
while (j < t)
{
assert(alpha[j]);
if (!alpha[j]->flagged)
{
dad = alpha[j]->first_parent();
assert(dad);
if (dad->freq < lowest_score)
{
lowest_score = dad->freq;
best = j;
}
}
j++;
}
dad = alpha[best]->first_parent();
assert(dad);
dad->flag();
dad->papa = newgroup;
newgroup->c[i] = dad;
newgroup->freq += dad->freq;
} //for
} //while
}
/* Average length of set (round to 2 dec. places) */
double avelen(void)
{
double dReturn = 0.0;
int tot_freq = 0;
for (int i = 0; i < n; i++)
{
assert(alpha[i]);
tot_freq += alpha[i]->freq;
dReturn += double(alpha[i]->freq * (alpha[i]->huffcode).length());
}
assert(tot_freq > 0);
return dReturn / tot_freq;
}
/* Output the results */
void output(void)
{
int i;
printf("%s %d; average length %4.2f\n", spacer.data(), set, avelen());
for (i = 0; i < n; i++)
cout << " " << char(i + 'A') << ": " << alpha[i]->huffcode << endl;
}
int main(int argc, char **argv)
{
cin >> r;
string line;
while (r > 0)
{
set++;
cin >> n;
t = n; // t is total number of symbols, including fictitious ones
assert(r > 1);
if ((n - 1) % (r - 1)) // if it won't work out evenly, add fake ones
t += r - 1 - ((n - r) % (r - 1));
assert(r <= 10 && r >= 2 && n >= 2 && n <= 26 && t >= n && t <= ALPHAMAX);
reset();
getline(cin, line);
istringstream iss(line);
int i = 0, q;
while (iss >> q)
{
assert(q <= 999 && q >= 1);
alpha[i++] = new lettergroup(q);
}
if (n < t)
while (i <= t)
alpha[i++] = new lettergroup(0);
treegen();
huffman();
output();
cin >> r;
spacer = "\nSet";
}
return 0;
}
[/cpp]
_-(GPI)-_
"Finally I have freed myself from the clutches of the garbage fairy!"
"Finally I have freed myself from the clutches of the garbage fairy!"
240 Variable Radix Huffman Encoding
the priority queue in this problem has a little trick, if there's a tie between a letter node and a "combination letter" (this node is the sum between two other nodes) then the "combination letter" node wins.
if there's a tie between two combination letters then the first added node wins.
try the I/O below
INPUT:
2 23 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67
2 14 1 667 334 1 667 334 1 667 334 1 667 334 1 667
0
my accepted code OUTPUT:
Set 1; average length 4.27
A: 01001100
B: 01001101
C: 0100111
D: 010010
E: 101110
F: 101111
G: 01000
H: 01110
I: 01111
J: 10110
K: 11110
L: 11111
M: 0010
N: 0011
O: 0101
P: 0110
Q: 1000
R: 1001
S: 1010
T: 1100
U: 1101
V: 1110
W: 000
Set 2; average length 3.22
A: 0010110
B: 010
C: 0011
D: 0010111
E: 011
F: 1110
G: 001000
H: 100
I: 1111
J: 001001
K: 101
L: 000
M: 001010
N: 110
if there's a tie between two combination letters then the first added node wins.
try the I/O below
INPUT:
2 23 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67
2 14 1 667 334 1 667 334 1 667 334 1 667 334 1 667
0
my accepted code OUTPUT:
Set 1; average length 4.27
A: 01001100
B: 01001101
C: 0100111
D: 010010
E: 101110
F: 101111
G: 01000
H: 01110
I: 01111
J: 10110
K: 11110
L: 11111
M: 0010
N: 0011
O: 0101
P: 0110
Q: 1000
R: 1001
S: 1010
T: 1100
U: 1101
V: 1110
W: 000
Set 2; average length 3.22
A: 0010110
B: 010
C: 0011
D: 0010111
E: 011
F: 1110
G: 001000
H: 100
I: 1111
J: 001001
K: 101
L: 000
M: 001010
N: 110
-
- New poster
- Posts: 1
- Joined: Sat Apr 20, 2002 11:57 pm
Re: 240 Variable Radix Huffman Encoding
For the input:
My WA attempt produces the output:
Please indicate if this is incorrect.
Code: Select all
10 26 100 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Code: Select all
Set 1; average length 1.14
A: 9
B: 72
C: 73
D: 74
E: 75
F: 76
G: 77
H: 78
I: 79
J: 80
K: 81
L: 82
M: 83
N: 84
O: 85
P: 86
Q: 87
R: 88
S: 89
T: 0
U: 1
V: 2
W: 3
X: 4
Y: 5
Z: 6
The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents. - Nathaniel Borenstein
-
- New poster
- Posts: 17
- Joined: Fri Oct 21, 2016 12:58 pm
- Location: NS, Canada
Re: 240 - Variable Radix Huffman Encoding
Yes, alexiago is right! Just got Accepted. Although this rule is not written very well in the problem description. Such a pity!