From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 26 11:40:47 2018 Received: (at submit) by debbugs.gnu.org; 26 Dec 2018 16:40:47 +0000 Received: from localhost ([127.0.0.1]:37899 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gcCEg-0003wS-T1 for submit@debbugs.gnu.org; Wed, 26 Dec 2018 11:40:47 -0500 Received: from eggs.gnu.org ([208.118.235.92]:34513) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gcBzF-0003YW-CQ for submit@debbugs.gnu.org; Wed, 26 Dec 2018 11:24:49 -0500 Received: from lists.gnu.org ([208.118.235.17]:37146) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gcBzA-00078X-5X for submit@debbugs.gnu.org; Wed, 26 Dec 2018 11:24:44 -0500 Received: from eggs.gnu.org ([208.118.235.92]:38041) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gcBz9-0006AY-0P for bug-gzip@gnu.org; Wed, 26 Dec 2018 11:24:43 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gcBz7-00076s-VG for bug-gzip@gnu.org; Wed, 26 Dec 2018 11:24:42 -0500 Received: from mail-it1-x129.google.com ([2607:f8b0:4864:20::129]:34984) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gcBz7-00076K-Pn for bug-gzip@gnu.org; Wed, 26 Dec 2018 11:24:41 -0500 Received: by mail-it1-x129.google.com with SMTP id p197so20734560itp.0 for ; Wed, 26 Dec 2018 08:24:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=cl3BYqIT+7fYv1KJYU39CnJVVH4Aht1O/01drDQHLeo=; b=P5wvuLDlPY7GNSgm/7XWIPQG39DPk5cm3wR4YNrnZAHP58/ppbL7VNkHsk/m9QJ/B7 VyNu7CtAOIznD+5fWNPIZUQMTVD0x3rhn8UJGQ9WqK+l/aXZd8T6uOQE/BpdNY5ilVjv UQ6me84ztHnXDr3txfGzdpD65IjXWjD0X7Wm3SJwg+6o2CpgYjxOqW1jMxOSSkH/h/RX /nVJleaAr1XBK75493LhTc4vvksVeDncW0VNJB4KBnwVJDtRHiL9kN6cFN5aUsTLk3li Zk9Pz+s9cv1q791kLJnU4ji5Pu4Ui0g+re4NUpyVDjUtdN5d8El8grpcVBevGLjnBHJn R+7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=cl3BYqIT+7fYv1KJYU39CnJVVH4Aht1O/01drDQHLeo=; b=ctdzBS9UDUpLplikA0FW0SquJIUjhqrRrE9emyrw3GkaG8C75oS3X5VsbbD5vr9apn a6MPfcJXiTjnRRaRCYFMZMMvjyDFwssUhLfw9YRJu7d++ZGf03uUSR31JYbL3xm0zsv4 Vmci1U+wOaNfSTR8O63/FkdpjyPrivt0JAJfH/23+5UVBQVPk7tWEeRTEEeJDiujYMcL mzr8gP8XWW1uisHzAed5EdTtE6BkyUE13rtAQs9T19j/42oz0Z3JQ4zFQrpcFdW5GZ90 YxXJzetJXucJwDBtDoD6mWgdLLp+ahGca2/IJa3j/PXZ1LFdDjUoz/iznmmCM2WD74Tc XZOA== X-Gm-Message-State: AA+aEWZpuuqEleHVW6PefXGXwlZCavobZ+0nVIzKl0wpCV7wVfd7HZBm ZLCJ1NjRjr6Gvh2qLU10Or2zHanxB+Kq/A9e5r5Cn5Ri X-Google-Smtp-Source: AFSGD/WVcIEcvsSeQJzuI+e3jfiY8HB6ewE69OYZ1fJDOo1nrqO/dkE6CRzW4pu+fgdALg4tnEZQmqPhS1KCKRxnTf8= X-Received: by 2002:a02:b093:: with SMTP id v19mr14100268jah.64.1545841480428; Wed, 26 Dec 2018 08:24:40 -0800 (PST) MIME-Version: 1.0 From: Namikaze Minato Date: Wed, 26 Dec 2018 17:24:14 +0100 Message-ID: Subject: zcat vs zcat -f -- different output To: bug-gzip@gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::129 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 26 Dec 2018 11:40:46 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hello guys. I have a large amount of confidential gzip compressed binary data. These files _all_ have a very specific property of giving a different output whether or not I use the "-f" flag of zcat (or gzip -d -c): one additional line appears when I use the -f flag. - I don't have the uncompressed versions of these files, nor the actual tool used to compress them - I am trying to create a reproducible example but have not yet succeeded Here is what it looks like, with null bytes replaced by dots for readability: (sorry for gmail's automatic line wrap, there are of course only two lines per output) $ file p.gz: gzip compressed data, was "20181218.TXT", last modified: Wed Dec 19 08:59:07 2018, from NTFS filesystem (NT) $ wc -c p.gz 9099264 p.gz $ zcat p.gz | wc -c 48085600 $ zcat -f p.gz | wc -c 48085955 $ gzip -d -c p.gz | tail -2 | sed -n 'l 0' | sed 's/\\000/./g' 20010101AAAAAAAA 010120010101Q AA....00A0000000AA0AA0AAA 0A AA 0101 0012001010101:01T2001012001:0101:01AAAAAAD/S\r$ T000378625..................... ...............................................\r$ $ gzip -d -c -f p.gz | tail -2 | sed -n 'l 0' | sed 's/\\000/./g' T000378625..................... ...............................................\r$ ...........................................................................= ...........................................................................= ...........................................................................= ...........................................................................= .......................................................$ That additional line containing only null bytes is not supposed to appear, is that some kind of padding that was not handled correctly by gzip? If this is not yet an identified bug, here are my questions: Do you know what could be happening? Do you know how I could try to reproduce the problem on non-confidential data for you to be able to debug? (I already tried re-compressing both versions of the decompressed files with this binary from 2007: http://gnuwin32.sourceforge.net/packages/gzip.htm but the problem does not happen) I can contact the guys who created the files and ask them anything, but I'd like to be sure of what to ask them because contacting them repeatedly would be considered very rude. What should I ask them? Thank you very much in advance for any reply which could make me understand what is happening :) Minato PS: I am not subscribed to the mailing list yet From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 26 13:04:08 2018 Received: (at 33878-done) by debbugs.gnu.org; 26 Dec 2018 18:04:08 +0000 Received: from localhost ([127.0.0.1]:37927 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gcDXM-0005zg-9T for submit@debbugs.gnu.org; Wed, 26 Dec 2018 13:04:08 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:50536) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gcDXJ-0005z8-6f for 33878-done@debbugs.gnu.org; Wed, 26 Dec 2018 13:04:07 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 12133160A6F; Wed, 26 Dec 2018 10:03:59 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id D_KAhme-IM_7; Wed, 26 Dec 2018 10:03:58 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5DBA0160AEB; Wed, 26 Dec 2018 10:03:58 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id bWA4n4RTzw5M; Wed, 26 Dec 2018 10:03:58 -0800 (PST) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 3A303160922; Wed, 26 Dec 2018 10:03:58 -0800 (PST) Subject: Re: bug#33878: zcat vs zcat -f -- different output To: Namikaze Minato , 33878-done@debbugs.gnu.org References: From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: Date: Wed, 26 Dec 2018 10:03:57 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 33878-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Namikaze Minato wrote: > Do you know what could be happening? When gzip -cdf sees junk input data, it simply copies it to standard output; this behavior is documented in the gzip manual (look for --force). Your input files have NUL-byte padding at the end, contrary to Internet RFC 1952. > Do you know how I could try to reproduce the problem on > non-confidential data for you to be able to debug? $ (gzip t.gz $ gzip -cd ) id 1gcVQT-0007pG-P5 for submit@debbugs.gnu.org; Thu, 27 Dec 2018 08:10:17 -0500 Received: from mail-it1-f176.google.com ([209.85.166.176]:55035) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gcVQP-0007ow-Ed for 33878-done@debbugs.gnu.org; Thu, 27 Dec 2018 08:10:13 -0500 Received: by mail-it1-f176.google.com with SMTP id i145so24458580ita.4 for <33878-done@debbugs.gnu.org>; Thu, 27 Dec 2018 05:10:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=w0EEeEmsN42+RkuRTQoHryhW8D+il0aJ1kgOslyAmWg=; b=CAU1YHVPot6iGy1G/xnUuh60H/pkd54ZqMZnKDE6BheCeIbtdreEdUZjZxFyeav8mt d8H9Jgv7/mRqfl+NZ+tOpvQjMb/ak/UI0mVuFQ2b+xdVq1OsDPKUHD4P4K56ZT6K8qQt eoADqK3jmOZsNlElTZIZ8Yqw1ryy5QOSaSy8xCJE0ZgCPM+9Xaik21y0PZm/djG5BjZ+ SyU5N19VmbqBBgiwy9LL82plNQqxZb9BbmxsKat4Z6CvecIb74RY15FV42jUmExbDeev WMdafdbKPxYr3SRqiesxHeXCX7d7MHbeohWIvs5nWQ5P6sV/MqhYJsPY3e43kkHzjruJ A7OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=w0EEeEmsN42+RkuRTQoHryhW8D+il0aJ1kgOslyAmWg=; b=o214jzWD4c/dwR5/YOLlv9a2vYQBdfpu5ixfLi/z9p+cBhGbkTs57nejVEd9maS4nZ t27QhXPCaoLI4ZrFefIO+vGbPbXyo/XlTbjPUzHfF0jd2tPOqoE8DwsDUbWgYjd4GG3N +40N5YnFd1B48yC16dKXkVPu6olfUC2dB/zUN1IPnC/UpPKjj578cVQ3nKdao4e2YSIZ ATZ1x/zOFg9X4cCcu2rmQl1BLKi9+MlrskKzcCMqgGVDGmIY/VOXl7eaEi/nEQeLKT4j dfk+csM6ww9AjejdaEdTq3aljhpN6dIiW6aHlKhwAjpRv7rTqqtP+YURPmDTibmEl6r5 gqUA== X-Gm-Message-State: AJcUukdt9YEkrSba0Q80IpOAjobQwy+nBg+I6XH3ilIxbg0cHh4bPnFY pEsvTrizVXEXv/Dgji+I3Mg8DKzFU7SYlSPmmhyX1w== X-Google-Smtp-Source: AFSGD/W2tFeOSQko/JoWJWmUuC7yIAAkOFWTH8SyED/sET3APyX/e0P/9KrdDyKijD4r08pMvOg5dS46HGWtt2QvGo4= X-Received: by 2002:a24:be0e:: with SMTP id i14mr14823623itf.153.1545916203467; Thu, 27 Dec 2018 05:10:03 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Namikaze Minato Date: Thu, 27 Dec 2018 14:09:37 +0100 Message-ID: Subject: Re: bug#33878: zcat vs zcat -f -- different output To: 33878-done@debbugs.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 33878-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Thanks a lot for the explanation! I checked and my files do contain unexpected NUL-byte trailing! Have a nice day. Minato On Wed, 26 Dec 2018 at 19:03, Paul Eggert wrote: > > Namikaze Minato wrote: > > > Do you know what could be happening? > > When gzip -cdf sees junk input data, it simply copies it to standard output; > this behavior is documented in the gzip manual (look for --force). Your input > files have NUL-byte padding at the end, contrary to Internet RFC 1952. > > > Do you know how I could try to reproduce the problem on > > non-confidential data for you to be able to debug? > > $ (gzip t.gz > $ gzip -cd 0000000 > $ gzip -cdf 0000000 \0 > 0000001 > > Though it's not a bug.... From unknown Sat Sep 06 00:11:15 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 25 Jan 2019 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator