GNU bug report logs -
#16361
[wishlist] improve freshness checking in compile cache
Previous Next
Reported by: Zefram <zefram <at> fysh.org>
Date: Sun, 5 Jan 2014 23:45:11 UTC
Severity: wishlist
Tags: notabug, wontfix
Done: Mark H Weaver <mhw <at> netris.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16361 in the body.
You can then email your comments to 16361 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guile <at> gnu.org
:
bug#16361
; Package
guile
.
(Sun, 05 Jan 2014 23:45:11 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Zefram <zefram <at> fysh.org>
:
New bug report received and forwarded. Copy sent to
bug-guile <at> gnu.org
.
(Sun, 05 Jan 2014 23:45:12 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
The automatic cache of compiled versions of scripts in guile-2.0.9
identifies scripts mainly by name, and partially by mtime. This is not
actually sufficient: it is easily misled by a pathname that refers to
different files at different times. Test case:
$ echo '(display "aaa\n")' >t13
$ echo '(display "bbb\n")' >t14
$ guile-2.0 t13
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;; or pass the --no-auto-compile argument to disable.
;;; compiling /home/zefram/usr/guile/t13
;;; compiled /home/zefram/.cache/guile/ccache/2.0-LE-8-2.0/home/zefram/usr/guile/t13.go
aaa
$ mv t14 t13
$ guile-2.0 t13
aaa
You can see that the mtime is not fully used here: the cache is misapplied
even if there is a delay of seconds between the creations of the two
script files. The cache's mtime check will only notice a mismatch if
the script currently seen under the supplied name was modified later
than when the previous script was *compiled*.
Obviously, in this test case the cache could trivially distinguish the
two script files by looking at the inode numbers. On its own the inode
number isn't sufficient, but exact match on device, inode number, and
mtime would be far superior to the current behaviour, only going wrong
in the presence of deliberate timestamp manipulation. As a bonus, if
the cache were actually *keyed* by inode number and device, rather than
by pathname, it would retain the caching of compilation across renamings
of the script.
Or, even better, the cache could be keyed by a cryptographic hash of the
file contents. This would be immune even to timestamp manipulation, and
would preserve the cached compilation even across the script being copied
to a fresh file or being edited and reverted. This would be a cache
worthy of the name. The only downside is the expense of computing the
hash, but I expect this is small compared to the expense of compilation.
Debian incarnation of this bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=734178
-zefram
Changed bug title to '[wishlist] improve freshness checking in compile cache' from 'compile cache confused about file identity'
Request was from
Mark H Weaver <mhw <at> netris.org>
to
control <at> debbugs.gnu.org
.
(Wed, 15 Jan 2014 21:17:02 GMT)
Full text and
rfc822 format available.
Severity set to 'wishlist' from 'normal'
Request was from
Mark H Weaver <mhw <at> netris.org>
to
control <at> debbugs.gnu.org
.
(Wed, 15 Jan 2014 21:17:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guile <at> gnu.org
:
bug#16361
; Package
guile
.
(Wed, 01 Oct 2014 19:24:02 GMT)
Full text and
rfc822 format available.
Message #12 received at 16361 <at> debbugs.gnu.org (full text, mbox):
tags 16361 + notabug wontfix
close 16361
thanks
Zefram <zefram <at> fysh.org> writes:
> The automatic cache of compiled versions of scripts in guile-2.0.9
> identifies scripts mainly by name, and partially by mtime. This is not
> actually sufficient: it is easily misled by a pathname that refers to
> different files at different times. Test case:
>
> $ echo '(display "aaa\n")' >t13
> $ echo '(display "bbb\n")' >t14
> $ guile-2.0 t13
> ;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
> ;;; or pass the --no-auto-compile argument to disable.
> ;;; compiling /home/zefram/usr/guile/t13
> ;;; compiled /home/zefram/.cache/guile/ccache/2.0-LE-8-2.0/home/zefram/usr/guile/t13.go
> aaa
> $ mv t14 t13
> $ guile-2.0 t13
> aaa
>
> You can see that the mtime is not fully used here: the cache is misapplied
> even if there is a delay of seconds between the creations of the two
> script files. The cache's mtime check will only notice a mismatch if
> the script currently seen under the supplied name was modified later
> than when the previous script was *compiled*.
>
> Obviously, in this test case the cache could trivially distinguish the
> two script files by looking at the inode numbers. On its own the inode
> number isn't sufficient, but exact match on device, inode number, and
> mtime would be far superior to the current behaviour, only going wrong
> in the presence of deliberate timestamp manipulation. As a bonus, if
> the cache were actually *keyed* by inode number and device, rather than
> by pathname, it would retain the caching of compilation across renamings
> of the script.
>
> Or, even better, the cache could be keyed by a cryptographic hash of the
> file contents. This would be immune even to timestamp manipulation, and
> would preserve the cached compilation even across the script being copied
> to a fresh file or being edited and reverted. This would be a cache
> worthy of the name. The only downside is the expense of computing the
> hash, but I expect this is small compared to the expense of compilation.
You could make the same complaint about 'make', 'rsync', or any number
of other programs. It's true that a cryptographic hash would be more
robust, but it would also be considerably more expensive in the common
case where the .go file is already in the cache.
I don't think it's worth paying this cost every time a .go file is
loaded, to guard against the unlikely scenario you outlined above.
The mtime check is very widely used, and accepted practice.
I'm closing this ticket.
Mark
Added tag(s) notabug and wontfix.
Request was from
Mark H Weaver <mhw <at> netris.org>
to
control <at> debbugs.gnu.org
.
(Wed, 01 Oct 2014 19:24:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
16361 <at> debbugs.gnu.org and Zefram <zefram <at> fysh.org>
Request was from
Mark H Weaver <mhw <at> netris.org>
to
control <at> debbugs.gnu.org
.
(Wed, 01 Oct 2014 19:24:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 30 Oct 2014 11:24:04 GMT)
Full text and
rfc822 format available.
bug unarchived.
Request was from
Zefram <zefram <at> fysh.org>
to
control <at> debbugs.gnu.org
.
(Wed, 13 May 2015 10:46:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guile <at> gnu.org
:
bug#16361
; Package
guile
.
(Wed, 13 May 2015 11:08:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 16361 <at> debbugs.gnu.org (full text, mbox):
Mark H Weaver wrote:
>You could make the same complaint about 'make', 'rsync', or any number
>of other programs.
Not really. make does use this type of freshness check, but it's used
in a specific situation where the freshness issue is immediately obvious
and is part of the program's visible primary concern. That's quite
unlike guile's compile cache, which as the name suggests is a cache.
It's meant to be unobtrusive, and the cache semantics are not a direct
part of the transaction that is ostensibly taking place, of running
a program that happens to be written in Scheme. Those circumstances,
of running an arbitrary program, are much broader than circumstances in
which make's freshness checks become relevant. make also gets a pass
from having always worked this way, whereas guile used to not cache
compilations. rsync, by contrast, does not use this type of freshness
checking; I believe it uses a hash mechanism.
> It's true that a cryptographic hash would be more
>robust, but it would also be considerably more expensive in the common
>case where the .go file is already in the cache.
>
>I don't think it's worth paying this cost every time
OK, you can rule that suggestion out, but I think you have erred in
jumping from that to wontfix on the general problem. You have not
addressed my prior suggestion of identifying programs by exact match on
device, inode number, and mtime. (File size could also be included.)
This freshness check is very cheap, because it's just a few fixed-size
fields from the stat structure, and you're already necessarily doing a
stat on the program file. Using the identifying fields as the cache
key even saves you a stat on the cached file. Although not quite as
effective as a hash comparison, it would be a huge practical improvement
over the current filename-and-inexact-mtime comparison.
-zefram
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 10 Jun 2015 11:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 years and 72 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.