GNU bug report logs - #51433
cp 9.0 sometimes fails with SEEK_DATA/SEEK_HOLE

Previous Next

Package: coreutils;

Reported by: Janne Heß <janne+coreutils <at> hess.ooo>

Date: Wed, 27 Oct 2021 11:56:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #17 received at 51433 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Janne Heß <janne+coreutils <at> hess.ooo>,
 51433 <at> debbugs.gnu.org
Subject: Re: bug#51433: cp 9.0 sometimes fails with SEEK_DATA/SEEK_HOLE
Date: Thu, 28 Oct 2021 12:11:41 -0700
On 10/28/21 06:54, Pádraig Brady wrote:

> Further debugging from Nix folks suggest ZFS was in consideration always,
> as invalid artifacts were written to a central cache from ZFS backed hosts.
> So we should at least change the comment in the patch to only mention ZFS.

Yes, that sounds reasonable.

This ZFS bug sounds pretty serious, though. Apparently it affects star 
and other programs too. I'm not sure we should attempt to work around it 
in coreutils, if the workarounds penalize everybody not using ZFS.

Is it cheap to check whether a file is actually in a ZFS filesystem? 
(Don't know how this'd work with loopback mounts, NFS, etc.) If so, it 
might be better to simply fdatasync (or even fsync) every input file 
that's on ZFS, until we know the ZFS bugs are fixed.

In theory we could fdatasync/fsync every input file on every platform. 
It'd be a shame to do that, though; that would slow down everybody 
merely to work around this ZFS bug.

> Also it seems like fsync() does avoid the ZFS issue as mentioned in:
> https://github.com/openzfs/zfs/issues/11900

Yes. I'm hoping that fdatasync suffices as it's lighter-weight. But if 
fsync is needed we can use fsync.

> BTW I'm slightly worried about retrying SEEK_DATA as
> FreeBSD 9.1 has a bug with large sparse files at least
> where it takes ages for SEEK_DATA to return:
>    36.13290615 lseek(3,0x0,SEEK_DATA)         = -32768 (0xffff8000)
> If ENXIO is not set in that case, then there is no issue.

Wait - lseek returns a number less than -1?! We could easily check for 
that FreeBSD bug, perhaps as an independent patch; this shouldn't 
require any extra syscalls.

Also please see 
<https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256205>. It appears 
that ZFS has significant bugs in this area on FreeBSD, bugs that haven't 
been fixed yet. That bug report does suggest that an fsync (and I hope 
fdatasync) works around the bugs.

> Also I'm not sure restricting sync to ENXIO is general enough,
> as an strace from a problematic cp, from the github issue above is:
>    lseek(3, 0, SEEK_DATA)            = 0
>    lseek(3, 0, SEEK_HOLE)            = 131072
>    lseek(3, 0, SEEK_SET)             = 0
>    read(3, "\177ELF\2\1"..., 131072) = 131072
>    write(4, "\177ELF\2\"..., 131072) = 131072
>    lseek(3, 131072, SEEK_DATA)       = -1 ENXIO
>    ftruncate(4, 3318813)             = 0

How about if we also do an fdatasync+retry after that 2nd lseek yields 
ENXIO? Would that suffice to work around the ZFS bug? Would it be too 
much of a performance penalty for non-ZFS users?




This bug report was last modified 3 years and 173 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.