GNU bug report logs - #33793
sed bug with regular expressions

Previous Next

Package: sed;

Reported by: Uladzimir Panasiuk <v.s.panasyuk <at> gmail.com>

Date: Tue, 18 Dec 2018 17:19:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Eric Blake <eblake <at> redhat.com>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#33793: closed (sed bug with regular expressions)
Date: Tue, 18 Dec 2018 18:24:03 +0000
[Message part 1 (text/plain, inline)]
Your message dated Tue, 18 Dec 2018 12:23:16 -0600
with message-id <1e16a005-8beb-8b86-01b8-5fabb4da6d33 <at> redhat.com>
and subject line Re: bug#33793: sed bug with regular expressions
has caused the debbugs.gnu.org bug report #33793,
regarding sed bug with regular expressions
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
33793: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=33793
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Uladzimir Panasiuk <v.s.panasyuk <at> gmail.com>
To: bug-sed <at> gnu.org
Subject: sed bug with regular expressions
Date: Tue, 18 Dec 2018 15:50:49 +0300
[Message part 3 (text/plain, inline)]
Hi. I've found the bug using sed. There is how to reproduce:
1) Run bash
2) Exec command \
echo weather -5.0 | sed
's/[^0-9\-\.]//g'
3) You will get "5.0". Expected output is "-5.0"

BUT
If you exec
echo weather -5.0 | sed 's/[^0-9\.\-]//g'
you''ll get the correct output "-5.0".

I am using GNU sed version 4.5 on Manjaro Linux.
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Eric Blake <eblake <at> redhat.com>
To: Uladzimir Panasiuk <v.s.panasyuk <at> gmail.com>, 33793-done <at> debbugs.gnu.org,
 GNU bug control <control <at> debbugs.gnu.org>
Subject: Re: bug#33793: sed bug with regular expressions
Date: Tue, 18 Dec 2018 12:23:16 -0600
tag 33793 notabug
thanks

On 12/18/18 6:50 AM, Uladzimir Panasiuk wrote:
> Hi. I've found the bug using sed. There is how to reproduce:
> 1) Run bash
> 2) Exec command \
> echo weather -5.0 | sed
> 's/[^0-9\-\.]//g'

You used two range expressions in this regex, but the result is the same 
as if you had used this regex with only one range expression::

's/[^0-9\.]//g'

Either way, you requested all characters except for the 10 digits, a 
literal backslash, or a literal dot.  Remember, a range expression [\-\] 
selects a single character of the backslash.  Since '-' is not excluded 
from the [] expression, sed correctly strips it.

> 3) You will get "5.0". Expected output is "-5.0"

You might be remembering the behavior of perl regex, where \ inside [] 
is an escape character.  But that's not how POSIX regex behaves - inside 
[], \ is literal, and there are no escape characters.

> 
> BUT
> If you exec
> echo weather -5.0 | sed 's/[^0-9\.\-]//g'

Here, your regex only has one range expression, but lists \ twice.  The 
repetition is harmless, but means that your expression is the same as 
this shorter:

's/[^0-9\.-]//g'

It is not obvious from your input whether you intended to be filtering 
out literal backslash or not, but if not, you probably meant to write:

's/[^0-9.-]//g'

with no backslash, and with the - last (as that is one of the few places 
that you can write - to be matched as itself rather than treated as a 
range operator between neighboring characters).

I'm closing this as not a bug, but feel free to reply with further 
questions or comments.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


This bug report was last modified 6 years and 214 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.