GNU bug report logs -
#41558
Regexp Bug
Previous Next
Full log
View this message in rfc822 format
On Wed, May 27, 2020 at 11:30 PM Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> On Tue, 26 May 2020 21:14:12 -0700
> "anton.paras" <anton <at> paras.nu> wrote:
>
> > I posted to Stack Exchange, and they recommended that I file a bug. I'd rather not copy+paste it all, so here's the link:
> >
> >
> >
> > https://unix.stackexchange.com/questions/579889/why-doesnt-this-sed-command-replace-the-3rd-to-last-and
> >
> >
> >
> > here's an example
> >
> >
> >
> > > echo 'dog and foo and bar and baz land good' |??? sed -E 's/(.*)\band\b((.*\band\b){2})/\1XYZ\2/'
> >
> >
> >
> > expected output:?dog XYZ foo and bar and baz land good
> >
> > actual output:?dog and foo XYZ bar and baz land good
> >
> >
> > here's my sed --version output:?sed (GNU sed) 4.2.2
> >
> >
> >
> > I hope this is helpful, cheers!
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It seems that there is the bug in regex.
>
> expected:
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/.*\band.*\band/p'
> $
>
> It also reproduces in grep.
>
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '(.*\band){2}'
> foo and bar land
> $ echo 'foo and bar land' | env LC_ALL=en_US.utf8 grep -E '.*\band.*\band'
> $
I agree that this looks like a regex bug. This should print nothing:
echo 'foo and bar land' | env LC_ALL=en_US.utf8 sed -nE '/(.*\band){2}/p'
just as this already does:
echo 'foo and bar land' | env LC_ALL=C sed -nE '/(.*\band){2}/p'
Does anyone know if there's a glibc bug number for it?
This bug report was last modified 4 years and 265 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.