GNU bug report logs -
#40242
n as delimiter alias
Previous Next
Reported by: Oğuz <oguzismailuysal <at> gmail.com>
Date: Thu, 26 Mar 2020 15:31:02 UTC
Severity: normal
Tags: confirmed
Merged with 40239
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
Message #8 received at 40242 <at> debbugs.gnu.org (full text, mbox):
tags 40242 confirmed
stop
Hello,
On 2020-03-25 11:30 p.m., Oğuz wrote:
> While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not
> match 'n' when 'n' is the delimiter. See:
>
> $ echo t | sed 'st\ttt' | xxd
> 00000000: 0a .
> $
> $ echo n | sed 'sn\nnn' | xxd
> 00000000: 6e0a
>
> Is this a bug or is there a sound logic behind this?
Thank you for finding this interesting edge-case.
I think it is a (very old) bug. I'm not sure about its origin,
perhaps Jim or Paolo can comment.
First,
let's start with what's expected (slightly modifying your examples):
The canonical usage, here "\t" becomes a TAB, and "t" is not replaced:
$ printf t | sed 's/\t//' | od -a -An
t
Then, using a different character "q" instead of "/", works the same:
$ printf t | sed 'sq\tqq' | od -a -An
t
The sed manual says (in section "3.3 The s command"):
"
The / characters may be uniformly replaced by any other single
character within any given s command.
The / character (or whatever other character is used in its
stead) can appear in the regexp or replacement only if it is
preceded by a \ character.
"
This is the reason "\t" represents a regular "t" (not TAB)
*if* the substitute command's delimiter is "t" as well:
$ printf t | sed 'st\ttt' | od -a -An
[no output, as expected]
And similarly for other characters:
printf x | sed 'sx\xxx' | od -a -An
printf a | sed 'sa\aaa' | od -a -An
printf z | sed 'sz\zzz' | od -a -An
[no output, as expected]
---
Second,
The "\n" case behaves differently, regardless of which
separator is used. It is always treated as "\n" (new line),
never literal "n", even if the separator is "n":
These are correct, as expected:
$ printf n | sed 's/\n//' | od -a -An
n
$ printf n | sed 's/\n//' | od -a -An
n
$ printf n | sed 'sx\nxx' | od -a -An
n
Here, we'd expect "\n" to be treated as a literal "n" character,
not "\n", but it is not (as you've found):
$ printf n | sed 'sn\nnn' | od -a -An
n
----
In the code, the "match_slash" function [1] is used to find
the delimiters of the "s" command (typically "slashes").
Special handling happens if a slash is found [2],
And in lines 557-8 there's this conditional:
else if (ch == 'n' && regex)
ch = '\n';
Which forces any "\n" to be a new-line, regardless if the
delimiter itself was an "n".
[1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531
[2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552
In older sed versions, these two lines where protected by
"#ifndef REG_PERL" [3] so perhaps it had something to do with regex
variants. But the origin of this line predates the git history.
Jim/Paolo - any ideas what this relates to?
https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551
---
Interestingly, removing these two lines does not cause
any test failures, so this might be easy to fix without causing
any regressions.
For now I'm leaving this item open until we decide how to deal with it.
regards,
- assaf
This bug report was last modified 2 years and 294 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.