GNU bug report logs -
#25707
[PATCH] grep: don't forcefully strip carriage returns
Previous Next
Reported by: Eric Blake <eblake <at> redhat.com>
Date: Mon, 13 Feb 2017 19:25:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> Cc: eggert <at> cs.ucla.edu, 25707 <at> debbugs.gnu.org
> From: Eric Blake <eblake <at> redhat.com>
> Date: Thu, 16 Feb 2017 11:40:29 -0600
>
> On 02/16/2017 11:26 AM, Eli Zaretskii wrote:
>
> >> I'm of the opinion that undossify_input causes more problems than it
> >> solves. We should trust fopen("r") to do the right thing, rather than
> >> reinventing it ourselves.
> >
> > FYI: You'd be losing an important feature for non-Cygwin DOS/Windows
> > users if you remove undossify_input and decide to trust fopen's "r"
> > (or "rt") mode.
>
> "rt" mode is not required to exist. And I don't know any modern
> implementation of "r" mode on a system with non-zero O_BINARY that eats
> ALL \r - both Cygwin and mingw just change \r\n into \n while still
> preserving other \r. The undossify() code that Paul just removed did
> NOT behave the same as text mode (in that it did, perhaps
> unintentionally, eat ALL \r).
I explicitly said my comments were not about Cygwin.
And you are forgetting the "stop at first ^Z" misfeature of text-mode
reads.
> I count it worse to TRY and reimplement the OS "r" mode and get the
> implementation wrong, with more lines of code, than to just trust the OS
> to do it correctly in the first place.
It is no use trusting the OS if it doesn't DTRT.
> The undossify() code may do the right thing on text files, but is
> absolutely wrong on binary files.
Grep is mainly a text-processing program. Its use with binary files
is a much rarer use case, and the user has opt-in options for those.
IMO it is more important to DTRT by default in the usual cases than
err in rare cases when the user fails to specify those opt-in options
needed to support that correctly.
> No, I don't know of any fopen(,"r") code that eats _all_ CR.
I do.
> Yes, you do make a point that the side effect of reimplementing text
> mode ourselves on a forced binary fd lets us "count" byte offsets where
> the count could be text while the scan was binary, or where the count
> could be binary while the scan was text. But in reality, are there any
> users that ever want a mixed-mode count?
Yes.
> If you are scanning in binary, you want the binary count; if you are
> scanning in text, you want the text count.
That's up to the user; Grep shouldn't second-guess its users, and
shouldn't force them into a specific modus operandi without a fire
escape.
But this is a futile argument: I don't expect to win it, and I'm well
aware that recent Grep releases are much less friendly to non-Posix
users than previous ones. Which is why I will stay with Grep 2.10,
which works satisfactorily for me.
The purpose of my message was to get on record about what these
changes mean: they mean losing features which some find useful. You
are not changing code which was written by someone who didn't know
about text-mode I/O, or didn't understand that this could be done
better, faster, etc., by dropping features.
This bug report was last modified 5 years and 278 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.