GNU bug report logs -
#16472
PD: gzip - tree overflow protection algorithm doubt/suggestion
Previous Next
To reply to this bug, email your comments to 16472 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gzip <at> gnu.org
:
bug#16472
; Package
gzip
.
(Thu, 16 Jan 2014 23:52:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Krzysztof Rybak" <krzysztofrybak6 <at> wp.pl>
:
New bug report received and forwarded. Copy sent to
bug-gzip <at> gnu.org
.
(Thu, 16 Jan 2014 23:52:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi,
I sent a message to the author of gzip but without response. Maybe You can help me with my doubts.
best regards,
Krzysztof
Dnia Czwartek, 2 Stycznia 2014 22:41 Krzysztof Rybak <krzysztofrybak6 <at> wp.pl> napisaĆ(a)
> Hello Jean-loup,
> I'm preparing my own algorithm based on Huffman compression method and deflate format.
>
> I have doubts if method implemented in gzip program is 100% correct.
> I mean tree overflow protection:
> just to remind: in deflate (RFC1951) and therefore in gzip there is creating tree for lit/len, dist (maximum length of codes is 15 bits) and code table (maximum length is 7 bits). In case of too long code word, there is tree modification implemented in gen_bitlen(desc) function (trees.c):
>
> do {
> bits = max_length-1;
> while (bl_count[bits] == 0) bits--;
> bl_count[bits]--; /* move one leaf down the tree */
> bl_count[bits+1] += 2; /* move one overflow item as its brother */
> bl_count[max_length]--;
> overflow -= 2;
> } while (overflow > 0);
>
> I'm not sure if this method is correct for original code length greater than 8 ( for maximum 7 ) and than 16 ( for maximum 15): even not every time. I couldn't obtain such symbol distribution to generate so long codes on real files indeed as such situation is extremely rare but generated it by modifying software:
>
> just before mentioned above overflow protection I changed bl_count histogram to one as for Fibonacci numbers (which is the worst for Huffman overflow - the longest codes, please refer to attached image): in my case I modified tree when maximum length is 7 bits (max_length = 7 is for code book compression):
> bl_count[0] = 1;
> bl_count[1] = 1;
> bl_count[2] = 1;
> bl_count[3] = 1;
> bl_count[4] = 1;
> bl_count[5] = 1;
> bl_count[6] = 1;
> bl_count[7] = 1;
> bl_count[8] = 1;
> bl_count[9] = 1;
> bl_count[10] = 2;
> overflow = 4;
>
> in fact in gzip there is setting length = 7 when it's longer than 7:
> if (bits > max_length){
> bits = max_length;
> overflow++;
> }
>
> ,so my histogram looks like:
> bl_count[0] = 1;
> bl_count[1] = 1;
> bl_count[2] = 1;
> bl_count[3] = 1;
> bl_count[4] = 1;
> bl_count[5] = 1;
> bl_count[6] = 1;
> bl_count[7] = 5;
> bl_count[8] = 0;
> bl_count[9] = 0;
> overflow = 4;
> and after overflow protection method from gzip:
> bl_count[0] = 1
> bl_count[1] = 1
> bl_count[2] = 1
> bl_count[3] = 1
> bl_count[4] = 1
> bl_count[5] = 0
> bl_count[6] = 2
> bl_count[7] = 5
> , which is incorrect Huffman tree: too many elements.
>
> trees.c with my modification and printf() function is attached. I compress obj2 from Calgary corpus.
> I know my description may look unclear, but I hope You will help me/clarify my doubts.
>
> best regards,
> Krzysztof Rybak
[Fibonacci.png (image/png, attachment)]
[trees.c (text/x-csrc, attachment)]
This bug report was last modified 11 years and 151 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.