Thanks for the reply. This might not be a bug though; I sent a similar mail (https://www.mail-archive.com/austin-group-l@opengroup.org/msg05881.html) to Austin Group mailing list asking what's the expected behavior in this case, and I was told ( https://www.mail-archive.com/austin-group-l@opengroup.org/msg05891.html) both behaviors -yielding n or empty line- are correct and standard should *probably* be amended to explicitly state that this is unspecified. And apparently ( https://www.mail-archive.com/austin-group-l@opengroup.org/msg05893.html) some other UNIXes adopted the same practice as GNU sed (or vice versa, I don't know which one is older). Regards 31 Mart 2020 Salı tarihinde Assaf Gordon yazdı: > tags 40242 confirmed > stop > > Hello, > > On 2020-03-25 11:30 p.m., Oğuz wrote: > >> While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not >> match 'n' when 'n' is the delimiter. See: >> >> $ echo t | sed 'st\ttt' | xxd >> 00000000: 0a . >> $ >> $ echo n | sed 'sn\nnn' | xxd >> 00000000: 6e0a >> >> Is this a bug or is there a sound logic behind this? >> > > Thank you for finding this interesting edge-case. > > I think it is a (very old) bug. I'm not sure about its origin, > perhaps Jim or Paolo can comment. > > First, > let's start with what's expected (slightly modifying your examples): > > The canonical usage, here "\t" becomes a TAB, and "t" is not replaced: > > $ printf t | sed 's/\t//' | od -a -An > t > > Then, using a different character "q" instead of "/", works the same: > > $ printf t | sed 'sq\tqq' | od -a -An > t > > The sed manual says (in section "3.3 The s command"): > " > The / characters may be uniformly replaced by any other single > character within any given s command. > > The / character (or whatever other character is used in its > stead) can appear in the regexp or replacement only if it is > preceded by a \ character. > " > > This is the reason "\t" represents a regular "t" (not TAB) > *if* the substitute command's delimiter is "t" as well: > > $ printf t | sed 'st\ttt' | od -a -An > [no output, as expected] > > And similarly for other characters: > > printf x | sed 'sx\xxx' | od -a -An > printf a | sed 'sa\aaa' | od -a -An > printf z | sed 'sz\zzz' | od -a -An > [no output, as expected] > > --- > > Second, > The "\n" case behaves differently, regardless of which > separator is used. It is always treated as "\n" (new line), > never literal "n", even if the separator is "n": > > These are correct, as expected: > $ printf n | sed 's/\n//' | od -a -An > n > $ printf n | sed 's/\n//' | od -a -An > n > $ printf n | sed 'sx\nxx' | od -a -An > n > > Here, we'd expect "\n" to be treated as a literal "n" character, > not "\n", but it is not (as you've found): > > $ printf n | sed 'sn\nnn' | od -a -An > n > > ---- > > In the code, the "match_slash" function [1] is used to find > the delimiters of the "s" command (typically "slashes"). > Special handling happens if a slash is found [2], > And in lines 557-8 there's this conditional: > > else if (ch == 'n' && regex) > ch = '\n'; > > Which forces any "\n" to be a new-line, regardless if the > delimiter itself was an "n". > > [1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531 > [2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552 > > In older sed versions, these two lines where protected by > "#ifndef REG_PERL" [3] so perhaps it had something to do with regex > variants. But the origin of this line predates the git history. > Jim/Paolo - any ideas what this relates to? > > https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c > ?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551 > > --- > > Interestingly, removing these two lines does not cause > any test failures, so this might be easy to fix without causing > any regressions. > > > For now I'm leaving this item open until we decide how to deal with it. > > regards, > - assaf > > > > > -- Oğuz