GNU bug report logs - #18806
grep -rP getline crashes prematurely (without displaying all results) on invalid UTF-8 input with LC_ALL=en_US.UTF-8

Previous Next

Package: grep;

Reported by: Shlomi Fish <shlomif <at> shlomifish.org>

Date: Thu, 23 Oct 2014 11:16:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18806 <18806 <at> debbugs.gnu.org>, 18806-done <18806-done <at> debbugs.gnu.org>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>, Shlomi Fish <shlomif <at> shlomifish.org>
Subject: bug#18806: grep -rP getline crashes prematurely (without displaying all results) on invalid UTF-8 input with LC_ALL=en_US.UTF-8
Date: Sat, 25 Oct 2014 18:24:24 -0700
On Sat, Oct 25, 2014 at 4:11 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> I'm getting a failure in pcre-invalid-utf8-input both before and after the
> change, with CentOS 6.5 and pcre-7.8-6.el6.x86_64.  In my case the failures
> are segmentation violations; perhaps 7.8-4 has a different failure mode, or
> perhaps there's some other minor change to your platform that causes libpcre
> to infloop.  Either way, this appears to be a PCRE bug that grep can't be
> expected to work around.
>
> Does the attached patch cause the test to fail reliably for you, instead of
> looping?

Yes.  And a timeout of 3s should be fine.  Thanks.  Please push that.

I've just built grep against the latest pcre from git (an Oct 10 commit with
this hash: cc48a55a5de9c2103f6657147149bcf63ff61579), and then all
of grep's tests pass.

Ideally, we would detect and warn about inadequate versions of pcre,
but that certainly need not block the release.

> By the way, I'm not sure why tests distinguish between
> require_en_utf8_locale_ and require_compiled_in_MB_support; the latter
> requires the former, and there's no point requiring the former unless we
> also require the latter.

It looks like I added the require_compiled_in_MB_support function in
grep commit v2.9-27-g46e5cc6, yet never realized that it subsumed
require_en_utf8_locale_.  You're welcome to clean up after the release.




This bug report was last modified 10 years and 210 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.