GNU bug report logs - #41558
Regexp Bug

Previous Next

Package: sed;

Reported by: "anton.paras" <anton <at> paras.nu>

Date: Wed, 27 May 2020 04:16:02 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: "anton.paras" <anton <at> paras.nu>, "bug-gnulib <at> gnu.org List" <bug-gnulib <at> gnu.org>, 41558 <at> debbugs.gnu.org
Subject: bug#41558: Regexp Bug
Date: Tue, 22 Sep 2020 16:40:02 -0700
On Wed, May 27, 2020 at 11:30 PM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> On Tue, 26 May 2020 21:14:12 -0700
> "anton.paras" <anton <at> paras.nu> wrote:
>
> > I posted to Stack Exchange, and they recommended that I file a bug. I'd rather not copy+paste it all, so here's the link:
> >
> >
> >
> > https://unix.stackexchange.com/questions/579889/why-doesnt-this-sed-command-replace-the-3rd-to-last-and
> >
> >
> >
> > here's an example
> >
> >
> >
> > > echo 'dog and foo and bar and baz land good' |??? sed -E 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/'
> >
> >
> >
> > expected output:?dog XYZ foo and bar and baz land good
> >
> > actual output:?dog and foo XYZ bar and baz land good
> >
> >
> > here's my sed --version output:?sed (GNU sed) 4.2.2
> >
> >
> >
> > I hope this is helpful, cheers!
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It seems that there is the bug in regex.
>
> expected:
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It also reproduces in grep.
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '(.*\band){2}'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '.*\band.*\band'
> $

I agree that this looks like a regex bug. This should print nothing:
  echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
just as this already does:
  echo 'foo and bar land' | env LC_ALL=C sed -nE '/(.*\band){2}/p'

Does anyone know if there's a glibc bug number for it?




This bug report was last modified 4 years and 265 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.