GNU bug report logs - #23635
possible bug in \c escape handling

Previous Next

Package: sed;

Reported by: Assaf Gordon <assafgordon <at> gmail.com>

Date: Sat, 28 May 2016 01:09:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23635 in the body.
You can then email your comments to 23635 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#23635; Package sed. (Sat, 28 May 2016 01:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Assaf Gordon <assafgordon <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Sat, 28 May 2016 01:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: bug-sed <at> gnu.org
Subject: possible bug in \c escape handling
Date: Fri, 27 May 2016 21:08:21 -0400
[Message part 1 (text/plain, inline)]
Hello,

There might be a small bug in processing of GNU extension escape sequence "\c".

When the character following "\c" is a backslash, the code consumes only one character, leading to inconsistent and incorrect output.
Example:

  $ echo a | sed 's/./\c\\/' | od -c
  0000000 034 \ \n
  0000003
  $ echo a | sed 's/./\c\d/' | od -c
  0000000 034 d \n
  0000003

but:

  $ echo a | sed 's/./\c\/' | od -c
  sed: -e expression #1, char 8: unterminated `s' command
  0000000

Meaning there is no way to generate the character '\x034' alone with "\c".

This is also somewhat inconsistent because it consumes a single backslash character
(whereas everywhere else a single backslash is the escape character itself).

For comparison, other characters behave as expected:

  $ sed 's/./\cA/' in | od -c
  0000000 001 \n
  0000002
  $ sed 's/./\c[/' in | od -c
  0000000 033 \n
  0000002
  $ sed 's/./\c]/' in | od -c
  0000000 035 \n
  0000002

As a side effect, it could also be confusing if the syntax allows 'recursive' escapes,
such as "\c\x41", which might be argued to be '\c' of the following character,
which should be first evaluated as \x61, resulting in "\cA".

The attached patch fixes the problem with the following rules:
1. '\c\\' = Control-Backslash = ASCII 0x34.
2. Any other backslash combinations after "\c" are rejected, and sed aborts.

Tests included. comments are welcomed.

- assaf





[0001-sed-reject-recursive-escaping-after-c.patch (text/x-patch, attachment)]

Information forwarded to bug-sed <at> gnu.org:
bug#23635; Package sed. (Sat, 28 May 2016 22:07:02 GMT) Full text and rfc822 format available.

Message #8 received at 23635 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 23635 <at> debbugs.gnu.org
Subject: Re: bug#23635: possible bug in \c escape handling
Date: Sat, 28 May 2016 15:06:25 -0700
On Fri, May 27, 2016 at 6:08 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> Hello,
>
> There might be a small bug in processing of GNU extension escape sequence
> "\c".
>
> When the character following "\c" is a backslash, the code consumes only one
> character, leading to inconsistent and incorrect output.
> Example:
>
>   $ echo a | sed 's/./\c\\/' | od -c
>   0000000 034 \ \n
>   0000003
>   $ echo a | sed 's/./\c\d/' | od -c
>   0000000 034 d \n
>   0000003
>
> but:
>
>   $ echo a | sed 's/./\c\/' | od -c
>   sed: -e expression #1, char 8: unterminated `s' command
>   0000000
>
> Meaning there is no way to generate the character '\x034' alone with "\c".
>
> This is also somewhat inconsistent because it consumes a single backslash
> character
> (whereas everywhere else a single backslash is the escape character itself).
>
> For comparison, other characters behave as expected:
>
>   $ sed 's/./\cA/' in | od -c
>   0000000 001 \n
>   0000002
>   $ sed 's/./\c[/' in | od -c
>   0000000 033 \n
>   0000002
>   $ sed 's/./\c]/' in | od -c
>   0000000 035 \n
>   0000002
>
> As a side effect, it could also be confusing if the syntax allows
> 'recursive' escapes,
> such as "\c\x41", which might be argued to be '\c' of the following
> character,
> which should be first evaluated as \x61, resulting in "\cA".
>
> The attached patch fixes the problem with the following rules:
> 1. '\c\\' = Control-Backslash = ASCII 0x34.
> 2. Any other backslash combinations after "\c" are rejected, and sed aborts.
>
> Tests included. comments are welcomed.

Nice catch. I like the patch.
So far, I can make only two suggestions:
  - add a NEWS entry, since this is a bug fix
  - I have a slight preference for the one-liner printf '%s\n' a a a a
a a a ---- rather than your 7-line here-document to generate that same
output in the test case.

And a comment wording nit:

+# Before sed-4.3, this resulted in '\034d' .
+# now it should be rejected.

I prefer to say e.g.,

# Before sed-4.3, this resulted in '\034d'. Now, it is rejected.

Thank you!




Information forwarded to bug-sed <at> gnu.org:
bug#23635; Package sed. (Sun, 29 May 2016 02:33:01 GMT) Full text and rfc822 format available.

Message #11 received at 23635 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 23635 <at> debbugs.gnu.org
Subject: Re: bug#23635: possible bug in \c escape handling
Date: Sat, 28 May 2016 22:31:54 -0400
[Message part 1 (text/plain, inline)]
Hello,

Thank you for the review.
Attached an improved version.

Regarding when the bug was introduced (in the 'NEWS'), version 3.02 did not support \c escapes, and version 4.0.6 had this bug (as does the first git commit). I wrote:
    [bug introduced in the sed-4.0.* releases]

Comments welcomed,
 - assaf
 

[0001-sed-reject-recursive-escaping-after-c.patch (application/octet-stream, attachment)]

Information forwarded to bug-sed <at> gnu.org:
bug#23635; Package sed. (Sun, 29 May 2016 03:41:02 GMT) Full text and rfc822 format available.

Message #14 received at 23635 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 23635 <at> debbugs.gnu.org
Subject: Re: bug#23635: possible bug in \c escape handling
Date: Sat, 28 May 2016 20:40:24 -0700
[Message part 1 (text/plain, inline)]
On Sat, May 28, 2016 at 7:31 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> Hello,
>
> Thank you for the review.
> Attached an improved version.
>
> Regarding when the bug was introduced (in the 'NEWS'), version 3.02 did not support \c escapes, and version 4.0.6 had this bug (as does the first git commit). I wrote:
>     [bug introduced in the sed-4.0.* releases]

Thanks for the quick update!
One more thing I noticed is that you use here docs
that interpolate. Just as I prefer to use single-quoted strings
most of the time, e.g., to avoid having to backslash-escape
every backslash, I prefer to use quoted here docs as in the
attached. Also, I prefer to space-delimit operators like '<', '<<', and '>'.
At least on one side.

Also, I removed a stray "EOF" after the final "Exit..." line.

Here's the proposed delta, on top of your patch:
[k.patch (text/x-patch, attachment)]

Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Mon, 30 May 2016 02:01:02 GMT) Full text and rfc822 format available.

Notification sent to Assaf Gordon <assafgordon <at> gmail.com>:
bug acknowledged by developer. (Mon, 30 May 2016 02:01:02 GMT) Full text and rfc822 format available.

Message #19 received at 23635-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 23635-done <at> debbugs.gnu.org
Subject: Re: bug#23635: possible bug in \c escape handling
Date: Sun, 29 May 2016 19:00:25 -0700
On Sat, May 28, 2016 at 8:40 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Sat, May 28, 2016 at 7:31 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
>> Hello,
>>
>> Thank you for the review.
>> Attached an improved version.
>>
>> Regarding when the bug was introduced (in the 'NEWS'), version 3.02 did not support \c escapes, and version 4.0.6 had this bug (as does the first git commit). I wrote:
>>     [bug introduced in the sed-4.0.* releases]
>
> Thanks for the quick update!
> One more thing I noticed is that you use here docs
> that interpolate. Just as I prefer to use single-quoted strings
> most of the time, e.g., to avoid having to backslash-escape
> every backslash, I prefer to use quoted here docs as in the
> attached. Also, I prefer to space-delimit operators like '<', '<<', and '>'.
> At least on one side.
>
> Also, I removed a stray "EOF" after the final "Exit..." line.
>
> Here's the proposed delta, on top of your patch:

I've pushed your patch amended with that change, and tweaked the log
message to have these lines: capitalized first word of each sentence
and added the (T) and (Bug fixes) qualifiers:

    * sed/compile.c: (RECURSIVE_ESCAPE_C): New error message.
    (normalize_text): Check for \c-backslash, reject recursive escaping.
    * testsuite/recursive-escape-c.sh: New file. Test new behaviour.
    * testsuite/Makefile.am (T): Add new test.
    * NEWS (Bug fixes): Mention it.

Thanks again.
As I write this, I realized that I should have referenced the bug
report URL in the commit log.  Oh well.




Information forwarded to bug-sed <at> gnu.org:
bug#23635; Package sed. (Mon, 30 May 2016 05:31:01 GMT) Full text and rfc822 format available.

Message #22 received at 23635-done <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 23635-done <at> debbugs.gnu.org
Subject: Re: bug#23635: possible bug in \c escape handling
Date: Mon, 30 May 2016 01:30:42 -0400
> I've pushed your patch amended with that change, and tweaked the log
> message to have these lines: capitalized first word of each sentence
> and added the (T) and (Bug fixes) qualifiers:
> 

Thank you!





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 27 Jun 2016 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 50 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.