GNU bug report logs - #25146
grep unusable on mingw - SAME_INODE woes

Previous Next

Package: grep;

Reported by: Bruno Haible <bruno <at> clisp.org>

Date: Fri, 9 Dec 2016 15:32:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 25146 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#25146; Package grep. (Fri, 09 Dec 2016 15:32:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bruno Haible <bruno <at> clisp.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 09 Dec 2016 15:32:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: bug-grep <at> gnu.org
Subject: grep unusable on mingw - SAME_INODE woes
Date: Fri, 09 Dec 2016 16:30:45 +0100
[Message part 1 (text/plain, inline)]
> grep snapshot:
>   http://meyering.net/grep/grep-ss.tar.xz      1.4 MB
>   http://meyering.net/grep/grep-ss.tar.xz.sig
>   http://meyering.net/grep/grep-2.26.39-ae3f.tar.xz

This release, built for mingw, is hardly usable:
  - 33 out of 107 tests fail,
  - A simple "grep.exe o xx > yy" fails with error
    grep.exe: input file 'xx' is also the output

More details:
- This happens both in a Cygwin mintty.exe window and in a cmd.exe window.
- It's the same for 32-bit mingw builds and 64-bit mingw builds
  (recipe: http://git.savannah.gnu.org/gitweb/?p=gperf.git;a=blob_plain;f=README.windows;hb=HEAD )
- The error is signalled in grep.c:1874.
  At this point, 'st' (of type 'struct _stat64') contains
    { st_dev = 0, st_ino = 0,
      st_mode = 0x81B6 = _S_IFREG | _S_IREAD | _S_IWRITE | 0x36,
      st_nlink = 1,
      st_uid = 0, st_gid = 0, st_rdev = 0, st_size = 4,
      st_atime = 1481099615, st_mtime = 1481099615, st_ctime = 1481099615 }
  Obviously, such a struct cannot reliably distinguish two different regular files.
  In other words, SAME_INODE cannot work.
- So, how do you determine identity of files in Windows?
  http://stackoverflow.com/questions/562701/best-way-to-determine-if-two-path-reference-to-same-file-in-windows
  But even this is wrong, the use of a BY_HANDLE_FILE_INFORMATION
  is not sufficient because it contains only 64-bit identifiers for
  files. See https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788(v=vs.85).aspx
  The best approach is to use GetFileInformationByHandleEx to produce a
  FILE_ID_INFO.

Find attached a proof-of-concept patch. (Really rough - needs
-D_WIN32_WINNT=_WIN32_WINNT_WIN8, and lacks good error handling.)

With it, I get:
$ ./grep.exe o xx > yy
$ ./grep.exe o xx > xx
grep.exe: input file 'xx' is also the output

That is, now the detection of identical regular files works.

How can we go forward from here? I would propose a gnulib module that defines
a data structure that combines a 'struct stat' with the FILE_ID_INFO for native
Windows, and rebase the 'same-inode' module on it.

The other approach, to override mingw's 'struct stat' and stat/fstat/lstat()
functions, would imply a performance hit to all stat calls, even those that
don't want to access the st_ino field.

Bruno

[grep-same-inode-fix.diff (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#25146; Package grep. (Fri, 09 Dec 2016 16:35:02 GMT) Full text and rfc822 format available.

Message #8 received at 25146 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Bruno Haible <bruno <at> clisp.org>, 25146 <at> debbugs.gnu.org
Subject: Re: bug#25146: grep unusable on mingw - SAME_INODE woes
Date: Fri, 9 Dec 2016 08:34:32 -0800
[Message part 1 (text/plain, inline)]
On 12/09/2016 07:30 AM, Bruno Haible wrote:
>
> How can we go forward from here? I would propose a gnulib module that defines
> a data structure that combines a 'struct stat' with the FILE_ID_INFO for native
> Windows, and rebase the 'same-inode' module on it.
>
> The other approach, to override mingw's 'struct stat' and stat/fstat/lstat()
> functions, would imply a performance hit to all stat calls, even those that
> don't want to access the st_ino field.
For grep's purposes a simple workaround is to have SAME_INODE always 
return 0 on MinGW, so I installed the attached patch into Gnulib. This 
isn't perfect (it means MinGW grep won't detect that the input and 
output are the same file), but it should be good enough to fix the 
glaring bugs and to conform to POSIX.

Although it might be helpful to have a fancier module that does the work 
of SAME_INODE but does it more accurately on MinGW, I'm not sure it's 
worth the hassle. A lot of code assumes that 'struct stat' suffices to 
identify files, and it would be a pain to clutter it with another struct 
of our own design that contains a 'struct stat' as a component. Even if 
we had another module like that, we'd need to keep SAME_INODE for the 
benefit of programs that cannot easily adopt the new struct.

It seems more plausible to override MinGW's struct stat and stat/etc. 
functions. To my mind it's OK to take a performance hit in the interest 
of portability. The performance hit would occur only on programs that 
need to deduce the equivalent of SAME_INODE.
[0001-same-inode-port-to-MinGW.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#25146; Package grep. (Fri, 09 Dec 2016 19:13:01 GMT) Full text and rfc822 format available.

Message #11 received at 25146 <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 25146 <at> debbugs.gnu.org
Subject: Re: bug#25146: grep unusable on mingw - SAME_INODE woes
Date: Fri, 09 Dec 2016 18:42:20 +0100
> I installed the attached patch into Gnulib. This 
> isn't perfect (it means MinGW grep won't detect that the input and 
> output are the same file), but it should be good enough to fix the 
> glaring bugs and to conform to POSIX.

Thanks, Paul. Yes, it surely fixes the immediate issue. I agree.

Bruno





Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Sun, 18 Dec 2016 21:40:02 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 180 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.