GNU bug report logs - #25750
[sed] Matching square brackets

Previous Next

Package: sed;

Reported by: 林自均 <johnlinp <at> gmail.com>

Date: Thu, 16 Feb 2017 07:16:02 UTC

Severity: normal

Done: Bob Proulx <bob <at> proulx.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 25750 in the body.
You can then email your comments to 25750 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#25750; Package sed. (Thu, 16 Feb 2017 07:16:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to 林自均 <johnlinp <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Thu, 16 Feb 2017 07:16:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 林自均 <johnlinp <at> gmail.com>
To: bug-sed <at> gnu.org, jim <at> meyering.net, agn <at> gnu.org
Subject: [sed] Matching square brackets
Date: Thu, 16 Feb 2017 05:11:18 +0000
[Message part 1 (text/plain, inline)]
Hi sed maintainers,

I want to remove the square brackets in a string:

$ echo '[1,2,3]' | sed 's/\[//g' | sed 's/\]//g'
1,2,3

And it works.

However, when I want to do it in a single sed, it does not work:

$ echo '[1,2,3]' | sed 's/[\[\]]//g'
[1,2,3]

I can manage to make it work by a weird regexp:

$ echo '[1,2,3]' | sed 's/[]\[]//g'
1,2,3

Is that a bug? If it is, I would like to spend some time to fix it.

Thanks for reading this email.

Best,
John Lin
​
[Message part 2 (text/html, inline)]

Information forwarded to bug-sed <at> gnu.org:
bug#25750; Package sed. (Thu, 16 Feb 2017 09:18:02 GMT) Full text and rfc822 format available.

Message #8 received at 25750 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: 林自均 <johnlinp <at> gmail.com>
Cc: 25750 <at> debbugs.gnu.org
Subject: Re: bug#25750: [sed] Matching square brackets
Date: Thu, 16 Feb 2017 02:16:58 -0700
林自均 wrote:
> I want to remove the square brackets in a string:
> 
> $ echo '[1,2,3]' | sed 's/\[//g' | sed 's/\]//g'
> 1,2,3
> 
> And it works.

Yes.  But the above isn't strictly correct regular expression usage.
Let's discuss it piece by piece.

  echo '[1,2,3]' |

Okay.  Good test pattern.

  sed 's/\[//g' |

Okay.  Since the [ would start a character class and you want it to
match itself it needs to be escaped.

  sed 's/\]//g'

This is not strictly correct.  You have escaped the ] with \].  But
that is not needed.  The ] does not do anything special in that
context.  It ends a character class started by a [ but outside of that
it is simply a normal character.  Escaping the \] defaults to being
just a ] character.  But it is a bad habit to get into because
escaping other characters such as \+ turns on ERE handling.  Your
expressoin should be this following instead.

  sed 's/]//g'

Those two could be combined into one sed command.

  echo '[1,2,3]' | sed -e 's/\[//g' -e 's/]//g'
    1,2,3

Or by a combined string split by the ';' separator.

  echo '[1,2,3]' | sed 's/\[//g;s/]//g'
    1,2,3

I tend to prefer the latter.  But either is fine.

> However, when I want to do it in a single sed, it does not work:
> 
> $ echo '[1,2,3]' | sed 's/[\[\]]//g'
> [1,2,3]

That is incorrect usage.  Do not escape characters inside of [...]
character classes.  The above is behaving correctly.  But do not
escape characters inside of [...] character classes.

You are starting a character class to match any of the enclosed
characters.  That is good.  But then it is broken by escaping the
characters inside the character class.  Do not escape them.  Inside of
a character class there is nothing special about those characters
because the class turns off special characters.  Therefore trying to
escape them is wrong.  That is the problem.

Please review the documentation on regular expressions here:

  https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html#Character-Classes-and-Bracket-Expressions

  Most meta-characters lose their special meaning inside bracket expressions:

  ']'  ends the bracket expression if it’s not the first list
       item. So, if you want to make the ‘]’ character a list item,
       you must put it first.

Therefore you must start the character class, then immediately put in
the ] to match itself literally.  It does not end the character class
since an empty class wouldn't make sense.

  [  -- start of the character class
  ]  -- match a literal ]
  [  -- match a literal [
  ]  -- end the class

Here is the working example:

  echo '[1,2,3]' | sed 's/[][]//g'
    1,2,3

> I can manage to make it work by a weird regexp:
> 
> $ echo '[1,2,3]' | sed 's/[]\[]//g'
> 1,2,3

That is also incorrect usage.  You have added an additional \ into the
class.  You thought you were esaping the [ but since it is inside of a
bracket character class expression already the \ was simply a normal
character and matched itself.

  echo '[1,2,3]\1\2\3'
  [1,2,3]\1\2\3
  echo '[1,2,3]\1\2\3' | sed 's/[]\[]//g'
  1,2,3123
  echo '[1,2,3]\1\2\3' | sed 's/[][]//g'
  1,2,3\1\2\3

As you can see including the \ also removed the \ characters too.
Because \ was included as part of the character class.

> Is that a bug? If it is, I would like to spend some time to fix it.

It is not a bug.  It is incorrect usage.  I will close the ticket.
But please let us know if this makes sense to you.  Feel free to
continue the discussion.

Bob




bug closed, send any further explanations to 25750 <at> debbugs.gnu.org and 林自均 <johnlinp <at> gmail.com> Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Thu, 16 Feb 2017 09:18:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 16 Mar 2017 11:24:03 GMT) Full text and rfc822 format available.

bug unarchived. Request was from 林自均 <johnlinp <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 28 Mar 2017 14:52:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-sed <at> gnu.org:
bug#25750; Package sed. (Tue, 28 Mar 2017 14:53:01 GMT) Full text and rfc822 format available.

Message #17 received at 25750 <at> debbugs.gnu.org (full text, mbox):

From: 林自均 <johnlinp <at> gmail.com>
To: Bob Proulx <bob <at> proulx.com>
Cc: 25750 <at> debbugs.gnu.org
Subject: Re: bug#25750: [sed] Matching square brackets
Date: Tue, 28 Mar 2017 14:52:35 +0000
[Message part 1 (text/plain, inline)]
Hi Bob,

Thank you for the detailed explanation. That was so helpful.

Best,
John Lin

林自均 <johnlinp <at> gmail.com> 於 2017年3月28日 週二 下午10:47寫道:

> Hi Bob,
>
> Thank you for the detailed explanation. That was so helpful.
>
> Best,
> John Lin
>
> Bob Proulx <bob <at> proulx.com> 於 2017年2月16日 週四 下午5:17寫道:
>
> 林自均 wrote:
> > I want to remove the square brackets in a string:
> >
> > $ echo '[1,2,3]' | sed 's/\[//g' | sed 's/\]//g'
> > 1,2,3
> >
> > And it works.
>
> Yes.  But the above isn't strictly correct regular expression usage.
> Let's discuss it piece by piece.
>
>   echo '[1,2,3]' |
>
> Okay.  Good test pattern.
>
>   sed 's/\[//g' |
>
> Okay.  Since the [ would start a character class and you want it to
> match itself it needs to be escaped.
>
>   sed 's/\]//g'
>
> This is not strictly correct.  You have escaped the ] with \].  But
> that is not needed.  The ] does not do anything special in that
> context.  It ends a character class started by a [ but outside of that
> it is simply a normal character.  Escaping the \] defaults to being
> just a ] character.  But it is a bad habit to get into because
> escaping other characters such as \+ turns on ERE handling.  Your
> expressoin should be this following instead.
>
>   sed 's/]//g'
>
> Those two could be combined into one sed command.
>
>   echo '[1,2,3]' | sed -e 's/\[//g' -e 's/]//g'
>     1,2,3
>
> Or by a combined string split by the ';' separator.
>
>   echo '[1,2,3]' | sed 's/\[//g;s/]//g'
>     1,2,3
>
> I tend to prefer the latter.  But either is fine.
>
> > However, when I want to do it in a single sed, it does not work:
> >
> > $ echo '[1,2,3]' | sed 's/[\[\]]//g'
> > [1,2,3]
>
> That is incorrect usage.  Do not escape characters inside of [...]
> character classes.  The above is behaving correctly.  But do not
> escape characters inside of [...] character classes.
>
> You are starting a character class to match any of the enclosed
> characters.  That is good.  But then it is broken by escaping the
> characters inside the character class.  Do not escape them.  Inside of
> a character class there is nothing special about those characters
> because the class turns off special characters.  Therefore trying to
> escape them is wrong.  That is the problem.
>
> Please review the documentation on regular expressions here:
>
>
> https://www.gnu.org/software/sed/manual/html_node/Character-Classes-and-Bracket-Expressions.html#Character-Classes-and-Bracket-Expressions
>
>   Most meta-characters lose their special meaning inside bracket
> expressions:
>
>   ']'  ends the bracket expression if it’s not the first list
>        item. So, if you want to make the ‘]’ character a list item,
>        you must put it first.
>
> Therefore you must start the character class, then immediately put in
> the ] to match itself literally.  It does not end the character class
> since an empty class wouldn't make sense.
>
>   [  -- start of the character class
>   ]  -- match a literal ]
>   [  -- match a literal [
>   ]  -- end the class
>
> Here is the working example:
>
>   echo '[1,2,3]' | sed 's/[][]//g'
>     1,2,3
>
> > I can manage to make it work by a weird regexp:
> >
> > $ echo '[1,2,3]' | sed 's/[]\[]//g'
> > 1,2,3
>
> That is also incorrect usage.  You have added an additional \ into the
> class.  You thought you were esaping the [ but since it is inside of a
> bracket character class expression already the \ was simply a normal
> character and matched itself.
>
>   echo '[1,2,3]\1\2\3'
>   [1,2,3]\1\2\3
>   echo '[1,2,3]\1\2\3' | sed 's/[]\[]//g'
>   1,2,3123
>   echo '[1,2,3]\1\2\3' | sed 's/[][]//g'
>   1,2,3\1\2\3
>
> As you can see including the \ also removed the \ characters too.
> Because \ was included as part of the character class.
>
> > Is that a bug? If it is, I would like to spend some time to fix it.
>
> It is not a bug.  It is incorrect usage.  I will close the ticket.
> But please let us know if this makes sense to you.  Feel free to
> continue the discussion.
>
> Bob
>
>
[Message part 2 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 26 Apr 2017 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 53 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.