GNU bug report logs - #16472
PD: gzip - tree overflow protection algorithm doubt/suggestion

Previous Next

Package: gzip;

Reported by: "Krzysztof Rybak" <krzysztofrybak6 <at> wp.pl>

Date: Thu, 16 Jan 2014 23:52:02 UTC

Severity: normal

To reply to this bug, email your comments to 16472 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#16472; Package gzip. (Thu, 16 Jan 2014 23:52:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Krzysztof Rybak" <krzysztofrybak6 <at> wp.pl>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Thu, 16 Jan 2014 23:52:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Krzysztof Rybak" <krzysztofrybak6 <at> wp.pl>
To: bug-gzip <at> gnu.org
Subject: PD: gzip - tree overflow protection algorithm doubt/suggestion
Date: Thu, 16 Jan 2014 22:55:25 +0100
[Message part 1 (text/plain, inline)]
Hi,
I sent a message to the author of gzip but without response. Maybe You can help me with my doubts.

best regards,
Krzysztof

Dnia Czwartek, 2 Stycznia 2014 22:41 Krzysztof Rybak <krzysztofrybak6 <at> wp.pl> napisaƂ(a) 
> Hello Jean-loup,

> I'm preparing my own algorithm based on Huffman compression method and deflate format. 

> 

> I have doubts if method implemented in gzip program is 100% correct. 

> I mean tree overflow protection:

> just to remind: in deflate (RFC1951) and therefore in gzip there is creating tree for lit/len, dist (maximum length of codes is 15 bits) and code table (maximum length is 7 bits). In case of too long code word, there is tree modification implemented in gen_bitlen(desc) function (trees.c):

> 

>     do {

>         bits = max_length-1;

>         while (bl_count[bits] == 0) bits--;

>         bl_count[bits]--;      /* move one leaf down the tree */

>         bl_count[bits+1] += 2; /* move one overflow item as its brother */

>         bl_count[max_length]--;

>         overflow -= 2;

>     } while (overflow > 0);

> 

> I'm not sure if this method is correct for original code length greater than 8 ( for maximum 7 ) and than 16 ( for maximum 15): even not every time. I couldn't obtain such symbol distribution to generate so long codes on real files indeed as such situation is extremely rare but generated it by modifying software:

> 

> just before mentioned above overflow protection I changed bl_count histogram to one as for Fibonacci numbers (which is the worst for Huffman overflow - the longest codes, please refer to attached image): in my case I modified tree when maximum length is 7 bits (max_length = 7 is for code book compression):

>     bl_count[0] = 1;

>     bl_count[1] = 1;

>     bl_count[2] = 1;

>     bl_count[3] = 1;

>     bl_count[4] = 1;

>     bl_count[5] = 1;

>     bl_count[6] = 1;

>     bl_count[7] = 1;

>     bl_count[8] = 1;

>     bl_count[9] = 1;

>     bl_count[10] = 2;

>     overflow = 4;

> 

> in fact in gzip there is setting length = 7 when it's longer than 7:

> if (bits > max_length){

>         bits = max_length;

>         overflow++;

> }

> 

> ,so my histogram looks like:

>     bl_count[0] = 1;

>     bl_count[1] = 1;

>     bl_count[2] = 1;

>     bl_count[3] = 1;

>     bl_count[4] = 1;

>     bl_count[5] = 1;

>     bl_count[6] = 1;

>     bl_count[7] = 5;

>     bl_count[8] = 0;

>     bl_count[9] = 0;

>     overflow = 4;

> and after overflow protection method from gzip:

>     bl_count[0] = 1

>     bl_count[1] = 1

>     bl_count[2] = 1

>     bl_count[3] = 1

>     bl_count[4] = 1

>     bl_count[5] = 0

>     bl_count[6] = 2

>     bl_count[7] = 5

> , which is incorrect Huffman tree: too many elements. 

> 

> trees.c with my modification and printf() function is attached. I compress obj2 from Calgary corpus.

> I know my description may look unclear, but I hope You will help me/clarify my doubts.  

> 

> best regards,

> Krzysztof Rybak

[Fibonacci.png (image/png, attachment)]
[trees.c (text/x-csrc, attachment)]

This bug report was last modified 11 years and 151 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.