GNU bug report logs - #40242
n as delimiter alias

Previous Next

Package: sed;

Reported by: Oğuz <oguzismailuysal <at> gmail.com>

Date: Thu, 26 Mar 2020 15:31:02 UTC

Severity: normal

Tags: confirmed

Merged with 40239

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Oğuz <oguzismailuysal <at> gmail.com>, 40242 <at> debbugs.gnu.org
Subject: bug#40242: n as delimiter alias
Date: Mon, 30 Mar 2020 22:42:09 -0600
tags 40242 confirmed
stop

Hello,

On 2020-03-25 11:30 p.m., Oğuz wrote:
> While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not
> match 'n' when 'n' is the delimiter. See:
> 
> $ echo t | sed 'st\ttt' | xxd
> 00000000: 0a                                       .
> $
> $ echo n | sed 'sn\nnn' | xxd
> 00000000: 6e0a
> 
> Is this a bug or is there a sound logic behind this?

Thank you for finding this interesting edge-case.

I think it is a (very old) bug. I'm not sure about its origin,
perhaps Jim or Paolo can comment.

First,
let's start with what's expected (slightly modifying your examples):

The canonical usage, here "\t" becomes a TAB, and "t" is not replaced:

   $ printf t | sed 's/\t//' | od -a -An
      t

Then, using a different character "q" instead of "/", works the same:

   $ printf t | sed 'sq\tqq' | od -a -An
      t

The sed manual says (in section "3.3 The s command"):
      "
      The / characters may be uniformly replaced by any other single
      character within any given s command.

      The / character (or whatever other character is used in its
      stead) can appear in the regexp or replacement only if it is
      preceded by a \ character.
      "

This is the reason "\t" represents a regular "t" (not TAB)
*if* the substitute command's delimiter is "t" as well:

      $ printf t | sed 'st\ttt' | od -a -An
      [no output, as expected]

And similarly for other characters:

      printf x | sed 'sx\xxx' | od -a -An
      printf a | sed 'sa\aaa' | od -a -An
      printf z | sed 'sz\zzz' | od -a -An
      [no output, as expected]

---

Second,
The "\n" case behaves differently, regardless of which
separator is used. It is always treated as "\n" (new line),
never literal "n", even if the separator is "n":

These are correct, as expected:
    $ printf n | sed 's/\n//' | od -a -An
       n
    $ printf n | sed 's/\n//' | od -a -An
       n
    $ printf n | sed 'sx\nxx' | od -a -An
       n

Here, we'd expect "\n" to be treated as a literal "n" character,
not "\n", but it is not (as you've found):

    $ printf n | sed 'sn\nnn' | od -a -An
       n

----

In the code, the "match_slash" function [1] is used to find
the delimiters of the "s" command (typically "slashes").
Special handling happens if a slash is found [2],
And in lines 557-8 there's this conditional:

              else if (ch == 'n' && regex)
                ch = '\n';

Which forces any "\n" to be a new-line, regardless if the
delimiter itself was an "n".

[1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531
[2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552

In older sed versions, these two lines where protected by
"#ifndef REG_PERL" [3] so perhaps it had something to do with regex 
variants. But the origin of this line predates the git history.
Jim/Paolo - any ideas what this relates to?

https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551

---

Interestingly, removing these two lines does not cause
any test failures, so this might be easy to fix without causing
any regressions.


For now I'm leaving this item open until we decide how to deal with it.

regards,
 - assaf








This bug report was last modified 2 years and 294 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.