GNU bug report logs - #16361
[wishlist] improve freshness checking in compile cache

Previous Next

Package: guile;

Reported by: Zefram <zefram <at> fysh.org>

Date: Sun, 5 Jan 2014 23:45:11 UTC

Severity: wishlist

Tags: notabug, wontfix

Done: Mark H Weaver <mhw <at> netris.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16361 in the body.
You can then email your comments to 16361 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#16361; Package guile. (Sun, 05 Jan 2014 23:45:11 GMT) Full text and rfc822 format available.

Acknowledgement sent to Zefram <zefram <at> fysh.org>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sun, 05 Jan 2014 23:45:12 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Zefram <zefram <at> fysh.org>
To: bug-guile <at> gnu.org
Subject: compile cache confused about file identity
Date: Sun, 5 Jan 2014 23:08:41 +0000
The automatic cache of compiled versions of scripts in guile-2.0.9
identifies scripts mainly by name, and partially by mtime.  This is not
actually sufficient: it is easily misled by a pathname that refers to
different files at different times.  Test case:

$ echo '(display "aaa\n")' >t13
$ echo '(display "bbb\n")' >t14
$ guile-2.0 t13
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /home/zefram/usr/guile/t13
;;; compiled /home/zefram/.cache/guile/ccache/2.0-LE-8-2.0/home/zefram/usr/guile/t13.go
aaa
$ mv t14 t13
$ guile-2.0 t13
aaa

You can see that the mtime is not fully used here: the cache is misapplied
even if there is a delay of seconds between the creations of the two
script files.  The cache's mtime check will only notice a mismatch if
the script currently seen under the supplied name was modified later
than when the previous script was *compiled*.

Obviously, in this test case the cache could trivially distinguish the
two script files by looking at the inode numbers.  On its own the inode
number isn't sufficient, but exact match on device, inode number, and
mtime would be far superior to the current behaviour, only going wrong
in the presence of deliberate timestamp manipulation.  As a bonus, if
the cache were actually *keyed* by inode number and device, rather than
by pathname, it would retain the caching of compilation across renamings
of the script.

Or, even better, the cache could be keyed by a cryptographic hash of the
file contents.  This would be immune even to timestamp manipulation, and
would preserve the cached compilation even across the script being copied
to a fresh file or being edited and reverted.  This would be a cache
worthy of the name.  The only downside is the expense of computing the
hash, but I expect this is small compared to the expense of compilation.

Debian incarnation of this bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734178

-zefram




Changed bug title to '[wishlist] improve freshness checking in compile cache' from 'compile cache confused about file identity' Request was from Mark H Weaver <mhw <at> netris.org> to control <at> debbugs.gnu.org. (Wed, 15 Jan 2014 21:17:02 GMT) Full text and rfc822 format available.

Severity set to 'wishlist' from 'normal' Request was from Mark H Weaver <mhw <at> netris.org> to control <at> debbugs.gnu.org. (Wed, 15 Jan 2014 21:17:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guile <at> gnu.org:
bug#16361; Package guile. (Wed, 01 Oct 2014 19:24:02 GMT) Full text and rfc822 format available.

Message #12 received at 16361 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Zefram <zefram <at> fysh.org>
Cc: 16361 <at> debbugs.gnu.org, request <at> debbugs.gnu.org
Subject: Re: bug#16361: compile cache confused about file identity
Date: Wed, 01 Oct 2014 15:22:58 -0400
tags 16361 + notabug wontfix
close 16361
thanks

Zefram <zefram <at> fysh.org> writes:

> The automatic cache of compiled versions of scripts in guile-2.0.9
> identifies scripts mainly by name, and partially by mtime.  This is not
> actually sufficient: it is easily misled by a pathname that refers to
> different files at different times.  Test case:
>
> $ echo '(display "aaa\n")' >t13
> $ echo '(display "bbb\n")' >t14
> $ guile-2.0 t13
> ;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
> ;;;       or pass the --no-auto-compile argument to disable.
> ;;; compiling /home/zefram/usr/guile/t13
> ;;; compiled /home/zefram/.cache/guile/ccache/2.0-LE-8-2.0/home/zefram/usr/guile/t13.go
> aaa
> $ mv t14 t13
> $ guile-2.0 t13
> aaa
>
> You can see that the mtime is not fully used here: the cache is misapplied
> even if there is a delay of seconds between the creations of the two
> script files.  The cache's mtime check will only notice a mismatch if
> the script currently seen under the supplied name was modified later
> than when the previous script was *compiled*.
>
> Obviously, in this test case the cache could trivially distinguish the
> two script files by looking at the inode numbers.  On its own the inode
> number isn't sufficient, but exact match on device, inode number, and
> mtime would be far superior to the current behaviour, only going wrong
> in the presence of deliberate timestamp manipulation.  As a bonus, if
> the cache were actually *keyed* by inode number and device, rather than
> by pathname, it would retain the caching of compilation across renamings
> of the script.
>
> Or, even better, the cache could be keyed by a cryptographic hash of the
> file contents.  This would be immune even to timestamp manipulation, and
> would preserve the cached compilation even across the script being copied
> to a fresh file or being edited and reverted.  This would be a cache
> worthy of the name.  The only downside is the expense of computing the
> hash, but I expect this is small compared to the expense of compilation.

You could make the same complaint about 'make', 'rsync', or any number
of other programs.  It's true that a cryptographic hash would be more
robust, but it would also be considerably more expensive in the common
case where the .go file is already in the cache.

I don't think it's worth paying this cost every time a .go file is
loaded, to guard against the unlikely scenario you outlined above.

The mtime check is very widely used, and accepted practice.

I'm closing this ticket.

      Mark




Added tag(s) notabug and wontfix. Request was from Mark H Weaver <mhw <at> netris.org> to control <at> debbugs.gnu.org. (Wed, 01 Oct 2014 19:24:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 16361 <at> debbugs.gnu.org and Zefram <zefram <at> fysh.org> Request was from Mark H Weaver <mhw <at> netris.org> to control <at> debbugs.gnu.org. (Wed, 01 Oct 2014 19:24:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 30 Oct 2014 11:24:04 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Zefram <zefram <at> fysh.org> to control <at> debbugs.gnu.org. (Wed, 13 May 2015 10:46:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guile <at> gnu.org:
bug#16361; Package guile. (Wed, 13 May 2015 11:08:02 GMT) Full text and rfc822 format available.

Message #23 received at 16361 <at> debbugs.gnu.org (full text, mbox):

From: Zefram <zefram <at> fysh.org>
To: 16361 <at> debbugs.gnu.org
Subject: Re: bug#16361: compile cache confused about file identity
Date: Wed, 13 May 2015 12:07:39 +0100
Mark H Weaver wrote:
>You could make the same complaint about 'make', 'rsync', or any number
>of other programs.

Not really.  make does use this type of freshness check, but it's used
in a specific situation where the freshness issue is immediately obvious
and is part of the program's visible primary concern.  That's quite
unlike guile's compile cache, which as the name suggests is a cache.
It's meant to be unobtrusive, and the cache semantics are not a direct
part of the transaction that is ostensibly taking place, of running
a program that happens to be written in Scheme.  Those circumstances,
of running an arbitrary program, are much broader than circumstances in
which make's freshness checks become relevant.  make also gets a pass
from having always worked this way, whereas guile used to not cache
compilations.  rsync, by contrast, does not use this type of freshness
checking; I believe it uses a hash mechanism.

>                    It's true that a cryptographic hash would be more
>robust, but it would also be considerably more expensive in the common
>case where the .go file is already in the cache.
>
>I don't think it's worth paying this cost every time

OK, you can rule that suggestion out, but I think you have erred in
jumping from that to wontfix on the general problem.  You have not
addressed my prior suggestion of identifying programs by exact match on
device, inode number, and mtime.  (File size could also be included.)
This freshness check is very cheap, because it's just a few fixed-size
fields from the stat structure, and you're already necessarily doing a
stat on the program file.  Using the identifying fields as the cache
key even saves you a stat on the cached file.  Although not quite as
effective as a hash comparison, it would be a huge practical improvement
over the current filename-and-inexact-mtime comparison.

-zefram




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 10 Jun 2015 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 72 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.