GNU bug report logs - #32342
[PATCH] Do not store mtime when compressing stdin

Previous Next

Package: gzip;

Reported by: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>

Date: Wed, 1 Aug 2018 11:53:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 32342 in the body.
You can then email your comments to 32342 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#32342; Package gzip. (Wed, 01 Aug 2018 11:53:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Wed, 01 Aug 2018 11:53:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>
To: bug-gzip <at> gnu.org
Cc: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>
Subject: [PATCH] Do not store mtime when compressing stdin
Date: Wed,  1 Aug 2018 13:51:40 +0200
This allows for reproducible output of
echo foo | gzip | md5sum
tar -cz -H ustar SOMEFILE | md5sum

See https://reproducible-builds.org/ for why this is good.
---
 gzip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gzip.c b/gzip.c
index a023d81..4b00e42 100644
--- a/gzip.c
+++ b/gzip.c
@@ -767,7 +767,7 @@ local void treat_stdin()
       {
         if (S_ISREG (istat.st_mode))
           time_stamp = get_stat_mtime (&istat);
-        else
+        else if (decompress)
           gettime (&time_stamp);
       }
 
-- 
2.16.4





Information forwarded to bug-gzip <at> gnu.org:
bug#32342; Package gzip. (Thu, 02 Aug 2018 06:48:02 GMT) Full text and rfc822 format available.

Message #8 received at 32342 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>, 32342 <at> debbugs.gnu.org
Subject: Re: bug#32342: [PATCH] Do not store mtime when compressing stdin
Date: Wed, 1 Aug 2018 23:47:48 -0700
Bernhard M. Wiedemann wrote:
> This allows for reproducible output of
> echo foo | gzip | md5sum

gzip -n already does that, so we don't need to change the gzip source code to 
get reproducible output.




Information forwarded to bug-gzip <at> gnu.org:
bug#32342; Package gzip. (Thu, 02 Aug 2018 08:32:02 GMT) Full text and rfc822 format available.

Message #11 received at 32342 <at> debbugs.gnu.org (full text, mbox):

From: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 32342 <at> debbugs.gnu.org
Subject: Re: bug#32342: [PATCH] Do not store mtime when compressing stdin
Date: Thu, 2 Aug 2018 10:31:32 +0200
[Message part 1 (text/plain, inline)]
On 2018-08-02 08:47, Paul Eggert wrote:
> Bernhard M. Wiedemann wrote:
>> This allows for reproducible output of
>> echo foo | gzip | md5sum
> 
> gzip -n already does that, so we don't need to change the gzip source
> code to get reproducible output.

I know that it does, but patching hundreds of packages [1] (including
ones still to be created) doing
tar -czf $file
to do instead
tar -c | gzip -n > $file

is a lot more effort than this one-line patch.
It is also worth noting that the first variant works without a shell
involved, so is easier+safer to call from other programs via system(3).

Is there a good reason to want the current time in .gz in this situation?
I found that .bz2 and .xz dont carry it either.

If there is a good reason, we could probably make it so that
echo foo | gzip -N
triggers the old default behaviour again.


Another possible approach would be to use the $SOURCE_DATE_EPOCH
environment variable's value instead of the current time,
as defined in
https://reproducible-builds.org/specs/source-date-epoch/

This would leave the default behaviour as is, but fix reproducible
builds for us, because the variable is set in our package build environment.


I'd just really like to have it reproducible by default.


[1]
https://github.com/performancecopilot/pcp/pull/540
https://github.com/L1L1/cardpeek/pull/97
https://github.com/singularityware/singularity/pull/1083
https://sourceforge.net/p/x3270/code/merge-requests/1/
https://stat.ethz.ch/pipermail/r-devel/2017-April/074165.html

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-gzip <at> gnu.org:
bug#32342; Package gzip. (Thu, 02 Aug 2018 09:06:02 GMT) Full text and rfc822 format available.

Message #14 received at 32342 <at> debbugs.gnu.org (full text, mbox):

From: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>
To: 32342 <at> debbugs.gnu.org
Cc: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>
Subject: [PATCH] Allow to override time_stamp with SOURCE_DATE_EPOCH
Date: Thu,  2 Aug 2018 11:03:59 +0200
in order to make builds reproducible.
See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.

---
1st draft: probably has some year-2038 problems left
---
 zip.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/zip.c b/zip.c
index 337932c..b5c8867 100644
--- a/zip.c
+++ b/zip.c
@@ -19,6 +19,7 @@
 
 #include <config.h>
 #include <ctype.h>
+#include <stdlib.h>
 
 #include "tailor.h"
 #include "gzip.h"
@@ -34,6 +35,7 @@ off_t header_bytes;   /* number of bytes in gzip header */
 int zip(in, out)
     int in, out;            /* input and output file descriptors */
 {
+    char *source_date_epoch;
     uch  flags = 0;         /* general purpose bit flags */
     ush  attr = 0;          /* ascii/binary flag */
     ush  deflate_flags = 0; /* pkzip -es, -en or -ex equivalent */
@@ -57,7 +59,10 @@ int zip(in, out)
     if (time_stamp.tv_nsec < 0)
       stamp = 0;
     else if (0 < time_stamp.tv_sec && time_stamp.tv_sec <= 0xffffffff)
-      stamp = time_stamp.tv_sec;
+      {
+        if ((source_date_epoch = getenv("SOURCE_DATE_EPOCH")) == NULL || (stamp = strtol(source_date_epoch, NULL, 10)) <= 0)
+          stamp = time_stamp.tv_sec;
+      }
     else
       {
         /* It's intended that timestamp 0 generates this warning,
-- 
2.16.4





Information forwarded to bug-gzip <at> gnu.org:
bug#32342; Package gzip. (Thu, 02 Aug 2018 16:00:02 GMT) Full text and rfc822 format available.

Message #17 received at 32342 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>, 32342 <at> debbugs.gnu.org
Subject: Re: bug#32342: [PATCH] Do not store mtime when compressing stdin
Date: Thu, 2 Aug 2018 08:59:26 -0700
On 08/02/2018 01:31 AM, Bernhard M. Wiedemann wrote:
> I know that it does, but patching hundreds of packages [1] (including
> ones still to be created) doing
> tar -czf $file
> to do instead
> tar -c | gzip -n > $file

This would be something that tar could and should address, I'd think.

> Is there a good reason to want the current time in .gz in this situation?

It seemed like a good idea at the time :-). At this point, it'd be an 
incompatible change to omit the timestamp.





Information forwarded to bug-gzip <at> gnu.org:
bug#32342; Package gzip. (Thu, 02 Aug 2018 16:02:01 GMT) Full text and rfc822 format available.

Message #20 received at 32342 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>, 32342 <at> debbugs.gnu.org
Subject: Re: bug#32342: [PATCH] Allow to override time_stamp with
 SOURCE_DATE_EPOCH
Date: Thu, 2 Aug 2018 09:01:05 -0700
On 08/02/2018 02:03 AM, Bernhard M. Wiedemann wrote:
> +        if ((source_date_epoch = getenv("SOURCE_DATE_EPOCH")) == NULL || (stamp = strtol(source_date_epoch, NULL, 10))

That won't work after 2038 on x32. Also, I'm leery of having environment 
variables affect standard utilities' behavior, though perhaps we could 
shoehorn this in.





Information forwarded to bug-gzip <at> gnu.org:
bug#32342; Package gzip. (Thu, 02 Aug 2018 16:13:01 GMT) Full text and rfc822 format available.

Message #23 received at 32342 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>, 32342 <at> debbugs.gnu.org
Subject: Re: bug#32342: [PATCH] Allow to override time_stamp with
 SOURCE_DATE_EPOCH
Date: Thu, 2 Aug 2018 18:12:02 +0200
On Thu, Aug 2, 2018 at 6:01 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 08/02/2018 02:03 AM, Bernhard M. Wiedemann wrote:
>>
>> +        if ((source_date_epoch = getenv("SOURCE_DATE_EPOCH")) == NULL ||
>> (stamp = strtol(source_date_epoch, NULL, 10))
>
>
> That won't work after 2038 on x32. Also, I'm leery of having environment
> variables affect standard utilities' behavior, though perhaps we could
> shoehorn this in.

I had the same initial negative reaction to adding support for an
envvar, but given the minor effect and worthy goal, I think the
balance has tipped in favor of allowing this one.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Thu, 02 Aug 2018 17:50:02 GMT) Full text and rfc822 format available.

Notification sent to "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>:
bug acknowledged by developer. (Thu, 02 Aug 2018 17:50:02 GMT) Full text and rfc822 format available.

Message #28 received at 32342-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>, 32342-done <at> debbugs.gnu.org
Subject: Re: bug#32342: [PATCH] Allow to override time_stamp with
 SOURCE_DATE_EPOCH
Date: Thu, 2 Aug 2018 10:49:28 -0700
[Message part 1 (text/plain, inline)]
On 08/02/2018 09:12 AM, Jim Meyering wrote:
> I had the same initial negative reaction to adding support for an
> envvar, but given the minor effect and worthy goal, I think the
> balance has tipped in favor of allowing this one.

Yes, on further thought I'm inclined to go whole hog and just fix the 
durn thing without any fancy environment variables. In hindsight, that 
timestamp never should have been put in there when stdin is a pipe. I 
installed the attached patch, which should fix the problem. What do you 
think? Although this is not strictly backward-compatible and arguably 
violates the GNU coding guidelines, I can't imagine anybody really wants 
that timestamp in there (except perhaps for spy agencies trying to track 
what people are doing :-).

[0001-gzip-make-the-output-more-reproducible.patch (text/x-patch, attachment)]

Information forwarded to bug-gzip <at> gnu.org:
bug#32342; Package gzip. (Fri, 03 Aug 2018 12:50:03 GMT) Full text and rfc822 format available.

Message #31 received at 32342-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "Bernhard M. Wiedemann" <bwiedemann <at> suse.de>, 32342-done <at> debbugs.gnu.org
Subject: Re: bug#32342: [PATCH] Allow to override time_stamp with
 SOURCE_DATE_EPOCH
Date: Fri, 3 Aug 2018 14:49:24 +0200
On Thu, Aug 2, 2018 at 7:49 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 08/02/2018 09:12 AM, Jim Meyering wrote:
>> I had the same initial negative reaction to adding support for an
>> envvar, but given the minor effect and worthy goal, I think the
>> balance has tipped in favor of allowing this one.
>
> Yes, on further thought I'm inclined to go whole hog and just fix the durn
> thing without any fancy environment variables. In hindsight, that timestamp
> never should have been put in there when stdin is a pipe. I installed the
> attached patch, which should fix the problem. What do you think? Although
> this is not strictly backward-compatible and arguably violates the GNU
> coding guidelines, I can't imagine anybody really wants that timestamp in
> there (except perhaps for spy agencies trying to track what people are doing
> :-).

Thanks. With that, we'll have some NEWS for the next release. I'm not
planning a release at the moment, but do prefer to make a new one at
least once per year -- it's been almost 8 months.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 01 Sep 2018 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 344 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.