Thanks, Paul.

I tried to clone and compile your latest changes from the Savannah repo but since some extra requirements are probably needed to compile from master branch (that are beyond my knowledge), I ended up not being able to validate it. Anyway, thanks for the correction and fix implementation!

Regards,
Rodrigo

On Sun, Sep 22, 2024 at 3:39 AM Paul Eggert <eggert@cs.ucla.edu> wrote:
On 2024-09-20 22:41, Paul Eggert wrote:
> I have the sneaking suspicion that the script is assuming properties of
> 'grep' that are not documented and that are not guaranteed.

In looking into the code a bit more, I can see some places where that is
what is happening.

A couple of things.

First, grep 3.11 uses buffer sizes that depend on earlier files that it
has scanned, and this affects whether grep decides later files are
binary. This can lead to the sort of confusion that you mentioned. There
are performance reasons to think that grep should not grow buffer sizes
for later files merely because earlier files had very long lines, as
huge buffers can hurt performance; so I installed onto the development
repository on Savannah the first attached patch to fix that. As a side
effect this may fix the symptoms you observed.

Second, 'grep' is not a good tool for determining whether a file is text
or binary, since the definition of "text" vs "binary" is
application-specific and grep's definition is suitable for 'grep' and
it's problematic to use it elsewhere. I installed the second attached
patch to try to document this better.

Hope this helps.

Boldly closing this bug as fixed; if I'm wrong we can reopen it.