2017-01-28 01:48:19 +0000, Assaf Gordon: [...] > On Thu, Aug 13, 2015 at 03:55:20PM +0100, Stephane Chazelas wrote: > >[...] The behaviour > >of sed on non-text input is unspecified, so it doesn't require > >that . not match a byte that is not part of a valid character. > >[...] > >That POSIX requirement is true for regexec() but not for text > >utilities. > > I'm far from familiar with POSIX intricacies, but doesn't that sound a bit > strange ? I would naively think that POSIX would encourage POSIX-compliant > test utilities to use the system's native regexec implenentation, instead of > supporting slightl different semantics... Hi Assaf, It doesn't preclude the use of regexec. It just leaves the behaviour unspecified when the input is not text, like when lines are longer than LINE_MAX or when they contain NUL bytes or when they contain sequences of bytes not forming valid characters or when there are characters after the last newline character. Upon sequences of bytes that don't form valid characters, you're free to exit with an error, shut down the computer, or whatever you like, POSIX doesn't care. What POSIX tells the user of the POSIX API (that is script writers, sed user) is that they can't expect anything on non-text input. GNU sed already handles lines longer than LINE_MAX nicely, as well as lines containing NUL bytes or an unterminated last line. I'd argue that for sequences of bytes that don't form valid characters, it would be nicer if "." or "[^anything]" matched each of the individual bytes. It's what bash's * and ? and [!anything] fnmatch() patterns do (even though in that case POSIX seem to forbid it; that has been discussed on the austin group mailing list as well). > >See that discussion on the Austin Group mailing list: > >http://thread.gmane.org/gmane.comp.standards.posix.austin.general/11059/focus=11098 > > This link seems broken. Would you know where to find this discussion online > ? [...] Yes. They relied on gmane for the mailing list archive. The web interface has been discontinued (https://lars.ingebrigtsen.no/2016/07/28/the-end-of-gmane/), then taken over by somebody else, but not everything is back. https://lars.ingebrigtsen.no/2016/09/06/gmane-alive/comment-page-1/ You can still find the discussion using the NNTP interface. I attach the most relevant message (from Geoff Clare of the Austin group). I can send you the whole discussion as a mailbox file if you like. -- Stephane