GNU bug report logs - #33763
RE backtrack for last slash fails when backslashblank involved

Previous Next

Package: sed;

Reported by: Peter Benjamin <pete <at> peterbenjamin.com>

Date: Sat, 15 Dec 2018 23:05:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33763 in the body.
You can then email your comments to 33763 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#33763; Package sed. (Sat, 15 Dec 2018 23:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Peter Benjamin <pete <at> peterbenjamin.com>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Sat, 15 Dec 2018 23:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Peter Benjamin <pete <at> peterbenjamin.com>
To: bug-sed <at> gnu.org
Subject: RE backtrack for last slash fails when backslashblank involved
Date: Sat, 15 Dec 2018 14:07:08 -0800
[Message part 1 (text/plain, inline)]
Backtrack last slash RE does not work when there are "\ " involved.

RE:
sed -e 's/^\(.*\)\/\([^\/]*\)$/\2\t\1\/\2/' findm

$ cat findm
/media/userid/data/movies/movie\ 1\ a.m4v
/media/userid/data/movies/movie\ 1\ a.extra.m4v
/media/userid/data/movies/movie\ 2.m4v
/media/userid/data/movies/movie\ 3.m4v
/media/userid/data/movies/movie4.m4v
/media/userid/data2/movies/data.m4v

STDOUT

$ sed -e 's/^\(.*\)\/\([^\/]*\)$/\2\t\1\/\2/' findm
/media/userid/data/movies/movie\ 1\ a.m4v
/media/userid/data/movies/movie\ 1\ a.extra.m4v
/media/userid/data/movies/movie\ 2.m4v
/media/userid/data/movies/movie\ 3.m4v
movie4.m4v	/media/userid/data/movies/movie4.m4v
data.m4v	/media/userid/data2/movies/data.m4v

----------------------------------------

Ubuntu 16.04

$ sed --version
sed (GNU sed) 4.2.2

$ uname -a
Linux *** 4.4.0-140-generic #166-Ubuntu SMP Wed Nov 14 20:09:47 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux

------------------------

Same backtrack last slash RE in perl works:

perl -n -e 'chomp;s/^(.*)\/([^\/]*)$/\2\t\1\/\2/;print"$_\n"' findm

STDOUT
movie\ 1\ a.m4v	/media/userid/data/movies/movie\ 1\ a.m4v
movie\ 1\ a.extra.m4v	/media/userid/data/movies/movie\ 1\
a.extra.m4v
movie\ 2.m4v	/media/userid/data/movies/movie\ 2.m4v
movie\ 3.m4v	/media/userid/data/movies/movie\ 3.m4v
movie4.m4v	/media/userid/data/movies/movie4.m4v
data.m4v	/media/userid/data2/movies/data.m4v

The End
[Message part 2 (text/html, inline)]

Information forwarded to bug-sed <at> gnu.org:
bug#33763; Package sed. (Sun, 16 Dec 2018 20:51:02 GMT) Full text and rfc822 format available.

Message #8 received at 33763 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Peter Benjamin <pete <at> peterbenjamin.com>, 33763 <at> debbugs.gnu.org
Subject: Re: bug#33763: RE backtrack for last slash fails when backslashblank
 involved
Date: Sun, 16 Dec 2018 13:49:52 -0700
tags 33763 notabug
close 33763
stop

Hello,

On 2018-12-15 3:07 p.m., Peter Benjamin wrote:
> Backtrack last slash RE does not work when there are "\ " involved.
> 
> RE:
> sed -e 's/^\(.*\)\/\([^\/]*\)$/\2\t\1\/\2/' findm
> 
> $ cat findm
> /media/userid/data/movies/movie\ 1\ a.m4v
> /media/userid/data/movies/movie\ 1\ a.extra.m4v
> /media/userid/data/movies/movie\ 2.m4v
> /media/userid/data/movies/movie\ 3.m4v
> /media/userid/data/movies/movie4.m4v
> /media/userid/data2/movies/data.m4v
> 
> STDOUT
> 
> $ sed -e 's/^\(.*\)\/\([^\/]*\)$/\2\t\1\/\2/' findm
> /media/userid/data/movies/movie\ 1\ a.m4v
> /media/userid/data/movies/movie\ 1\ a.extra.m4v
> /media/userid/data/movies/movie\ 2.m4v
> /media/userid/data/movies/movie\ 3.m4v
> movie4.m4v	/media/userid/data/movies/movie4.m4v
> data.m4v	/media/userid/data2/movies/data.m4v
> 
> ------------------------
> 
> Same backtrack last slash RE in perl works:
> 
> perl -n -e 'chomp;s/^(.*)\/([^\/]*)$/\2\t\1\/\2/;print"$_\n"' findm
> 
> STDOUT
> movie\ 1\ a.m4v	/media/userid/data/movies/movie\ 1\ a.m4v
> movie\ 1\ a.extra.m4v	/media/userid/data/movies/movie\ 1\
> a.extra.m4v
> movie\ 2.m4v	/media/userid/data/movies/movie\ 2.m4v
> movie\ 3.m4v	/media/userid/data/movies/movie\ 3.m4v
> movie4.m4v	/media/userid/data/movies/movie4.m4v
> data.m4v	/media/userid/data2/movies/data.m4v
> 

Thank you for providing such clear and reproducible examples -
it makes the troubleshooting much easier.

First,
let's enable sed's extended regular expression syntax (by adding "-E"),
to make the comparison simpler.
The following "sed -E" command is equivalent to the one you used above,
and produces the same (unsatisfying) results:

       sed -E -e 's/^(.*)\/([^\/]*)$/\2\t\1\/\2/'             findm
perl -n -e 'chomp;s/^(.*)\/([^\/]*)$/\2\t\1\/\2/;print"$_\n"' findm

Now,
The culprit lies in the bracket expression:
   [^\/]

The POSIX definition of regular expression bracket expression says:

  "The special characters '.', '*', '[', and '\' (period, asterisk,
  left-bracket, and backslash, respectively) shall lose their special
  meaning within a bracket expression."

(from section 9.3.5 subitem 1, last sentence in the paragraph:
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_05 
)

Meaning, the bracket expression "[^\/]" is not "every character except
regular slash" (with the slash character escaped by backslash).
Instead It means "every character except slash or backslash".
Since the first four file names contain backslash, the regex does not
match them.

If the backslash is removed, the results are as you expected:

  $ sed -E -e 's/^(.*)\/([^/]*)$/\2\t\1\/\2/' findm
  movie\ 1\ a.m4v /media/userid/data/movies/movie\ 1\ a.m4v
  movie\ 1\ a.extra.m4v   /media/userid/data/movies/movie\ 1\   a.extra.m4v
  movie\ 2.m4v    /media/userid/data/movies/movie\ 2.m4v
  movie\ 3.m4v    /media/userid/data/movies/movie\ 3.m4v
  movie4.m4v      /media/userid/data/movies/movie4.m4v
  data.m4v        /media/userid/data2/movies/data.m4v

As such, I conclude that it is not a sed bug.
Perhaps Perl's parsing requires to escape the slash,
which leads to this apparent differences.

I'm closing this as "not a bug",
but discussion can continue by replying to this thread.


regards,
 - assaf





Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 16 Dec 2018 20:51:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 33763 <at> debbugs.gnu.org and Peter Benjamin <pete <at> peterbenjamin.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 16 Dec 2018 20:51:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 14 Jan 2019 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 240 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.