GNU bug report logs - #25146
grep unusable on mingw - SAME_INODE woes

Previous Next

Package: grep;

Reported by: Bruno Haible <bruno <at> clisp.org>

Date: Fri, 9 Dec 2016 15:32:02 UTC

Severity: wishlist

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: bug-grep <at> gnu.org
Subject: grep unusable on mingw - SAME_INODE woes
Date: Fri, 09 Dec 2016 16:30:45 +0100
[Message part 1 (text/plain, inline)]
> grep snapshot:
>   http://meyering.net/grep/grep-ss.tar.xz      1.4 MB
>   http://meyering.net/grep/grep-ss.tar.xz.sig
>   http://meyering.net/grep/grep-2.26.39-ae3f.tar.xz

This release, built for mingw, is hardly usable:
  - 33 out of 107 tests fail,
  - A simple "grep.exe o xx > yy" fails with error
    grep.exe: input file 'xx' is also the output

More details:
- This happens both in a Cygwin mintty.exe window and in a cmd.exe window.
- It's the same for 32-bit mingw builds and 64-bit mingw builds
  (recipe: http://git.savannah.gnu.org/gitweb/?p=gperf.git;a=blob_plain;f=README.windows;hb=HEAD )
- The error is signalled in grep.c:1874.
  At this point, 'st' (of type 'struct _stat64') contains
    { st_dev = 0, st_ino = 0,
      st_mode = 0x81B6 = _S_IFREG | _S_IREAD | _S_IWRITE | 0x36,
      st_nlink = 1,
      st_uid = 0, st_gid = 0, st_rdev = 0, st_size = 4,
      st_atime = 1481099615, st_mtime = 1481099615, st_ctime = 1481099615 }
  Obviously, such a struct cannot reliably distinguish two different regular files.
  In other words, SAME_INODE cannot work.
- So, how do you determine identity of files in Windows?
  http://stackoverflow.com/questions/562701/best-way-to-determine-if-two-path-reference-to-same-file-in-windows
  But even this is wrong, the use of a BY_HANDLE_FILE_INFORMATION
  is not sufficient because it contains only 64-bit identifiers for
  files. See https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788(v=vs.85).aspx
  The best approach is to use GetFileInformationByHandleEx to produce a
  FILE_ID_INFO.

Find attached a proof-of-concept patch. (Really rough - needs
-D_WIN32_WINNT=_WIN32_WINNT_WIN8, and lacks good error handling.)

With it, I get:
$ ./grep.exe o xx > yy
$ ./grep.exe o xx > xx
grep.exe: input file 'xx' is also the output

That is, now the detection of identical regular files works.

How can we go forward from here? I would propose a gnulib module that defines
a data structure that combines a 'struct stat' with the FILE_ID_INFO for native
Windows, and rebase the 'same-inode' module on it.

The other approach, to override mingw's 'struct stat' and stat/fstat/lstat()
functions, would imply a performance hit to all stat calls, even those that
don't want to access the st_ino field.

Bruno

[grep-same-inode-fix.diff (text/x-patch, attachment)]

This bug report was last modified 8 years and 180 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.