Ok, more things were discovered. After I had a problem exactly at the "xargs -n 2872", I ran the xargs again with the "-t" flag to get the command, and noticed that the 2 missing files were exactly the 2 last ones on the command file list.

grep -Il . "{ 2870 files }" ./apex/images/apex_ui/psd/apex_5_ui.ai ./apex/images/apex_ui/psd/apex-logo.ai

Now if I run:

[user@server folder]$ cat /tmp/cmd1
grep -Il . ./apex/images/apex_ui/psd/apex_5_ui.ai ./apex/images/apex_ui/psd/apex-logo.ai ... "{ 2870 files }"

[user@server folder]$ wc -c /tmp/cmd1
131049 /tmp/cmd1

[user@server folder]$ cat /tmp/cmd2
grep -Il . "{ 2870 files }" ./apex/images/apex_ui/psd/apex_5_ui.ai ./apex/images/apex_ui/psd/apex-logo.ai
[user@server folder]$ wc -c /tmp/cmd2
131049 /tmp/cmd2


[user@server folder]$ sh /tmp/cmd1 | wc -l
1072
[user@server folder]$ sh /tmp/cmd2 | wc -l
1070

In other words, depending on the location on the command line where those 2 files are provided to grep, we will have a different result.

Can I run those 2 grep commands with some sort of debug flag and send them back for analysis? The file list is exactly the same, just changing the file order.

Thanks,
Rodrigo

On Fri, Sep 20, 2024 at 10:54 AM Rodrigo Jorge <rodrigoaraujorge@gmail.com> wrote:
I could reproduce the same issue without xargs, so I think we can take it out of the picture:

[user@server folder]$ find -type f -not -path "./.patch_storage/*" -not -name "tfa_setup" -print > /tmp/file.list
[user@server folder]$ wc -l /tmp/file.list
37443 /tmp/file.list

[user@server folder]$ cat /tmp/file.list | xargs -n 100 grep -Il '.' > /tmp/list1.list
[user@server folder]$ wc -l /tmp/list1.list
23405 /tmp/list1.list

[user@server folder]$ grep -Il '.' $(cat /tmp/file.list) > /tmp/list2.list
[user@server folder]$ wc -l /tmp/list2.list
23403 /tmp/list2.list

[user@server folder]$ diff /tmp/list1.list /tmp/list2.list
12268,12269d12267
< ./apex/images/apex_ui/psd/apex_5_ui.ai
< ./apex/images/apex_ui/psd/apex-logo.ai
[user@server folder]$

So we can see that running "grep -Il '.' $(cat /tmp/file.list)" will also skip those 2 files, unless the problem is actually bringing them, and xargs are adding those 2 files somehow.

Those files are PDFs:

[user@server folder]$ file ./apex/images/apex_ui/psd/apex_5_ui.ai
./apex/images/apex_ui/psd/apex_5_ui.ai: PDF document, version 1.5
[user@server folder]$ file ./apex/images/apex_ui/psd/apex-logo.ai
./apex/images/apex_ui/psd/apex-logo.ai: PDF document, version 1.5

[user@server folder]$ head ./apex/images/apex_ui/psd/apex_5_ui.ai
%����1.5
<</Length 39582/Subtype/XML/Type/Metadata>>stream8 0 R 209 0 R]/ON[6 0 R 7 0 R 210 0 R]/Order 211 0 R/RBGroups[]>>/OCGs[6 0 R 7 0 R 5 0 R 208 0 R 210 0 R 209 0 R]>>/Pages 3 0 R/Type/Catalog>>
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:dc="http://purl.org/dc/elements/1.1/">
         <dc:format>application/pdf</dc:format>
         <dc:title>
            <rdf:Alt>

I could also find exactly the point it breaks:

[user@server folder]$ cat /tmp/file.list | xargs -n 100 grep -Il '.' | wc -l
23405
[user@server folder]$ cat /tmp/file.list | xargs -n 1000 grep -Il '.' | wc -l
23405
[user@server folder]$ cat /tmp/file.list | xargs -n 2000 grep -Il '.' | wc -l
23405
[user@server folder]$ cat /tmp/file.list | xargs -n 2871 grep -Il '.' | wc -l
23405
[user@server folder]$ cat /tmp/file.list | xargs -n 2872 grep -Il '.' | wc -l
23403

I will reply shortly with the strace findings.

On Fri, Sep 20, 2024 at 10:32 AM David G. Pickett <dgpickett@aol.com> wrote:
While the output may be bulky, on Linux you can try the strace command to see exactly what it is up to.  It will show the execvp() call, for instance.  You might need a bigger -s!

$ strace -f -v -s 262144 <YOUR_CMD>

On Thursday, September 19, 2024 at 10:29:30 AM EDT, Rodrigo Jorge <rodrigoaraujorge@gmail.com> wrote:


Hello. I'm trying to use grep to get the list of all non-binary files in a
given folder. I tried with the 2.20 and the 3.11 release.

For some reason, grep is providing 2 false negatives when the list is huge.
This issue does not happen if I break the grep input with "xargs -n X".

Check below:

[opc@oradiff-core dbhome_1]$ grep -V
grep (GNU grep) 3.11
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see

[opc@oradiff-core dbhome_1]$ find -type f -not -path "./.patch_storage/*"
-not -name "tfa_setup" -print0 2>> /tmp/error.list | xargs -0 -n 100 grep
-Il '.' > /tmp/list1.list

[opc@oradiff-core dbhome_1]$ find -type f -not -path "./.patch_storage/*"
-not -name "tfa_setup" -print0 2>> /tmp/error.list | xargs -0 grep -Il '.'
> /tmp/list2.list

[opc@oradiff-core dbhome_1]$ diff /tmp/list1.list /tmp/list2.list
12268,12269d12267
< ./apex/images/apex_ui/psd/apex_5_ui.ai
< ./apex/images/apex_ui/psd/apex-logo.ai

[opc@oradiff-core dbhome_1]$ wc -l /tmp/list1.list /tmp/list2.list
  23397 /tmp/list1.list
  23395 /tmp/list2.list
  46792 total

The output should not show any difference.

The same issue was also reproduced in grep 2.20.

Thanks,
Rodrigo