GNU bug report logs - #34492
rx: ASCII-raw byte ranges comprise all of Unicode

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 15 Feb 2019 18:25:02 UTC

Severity: normal

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: bug-gnu-emacs <at> gnu.org
Subject: rx: ASCII-raw byte ranges comprise all of Unicode
Date: Fri, 15 Feb 2019 19:23:56 +0100
`rx' incorrectly considers character ranges between ASCII and raw bytes to cover all codes in-between, which includes all non-ASCII Unicode chars.
This causes (any "\000-\377" ?Å) to be simplified to (any "\000-\377"), which is not at all the same thing: [\000-\377] really means [\000-\177\200-\377] -- the transformation is normally made by the Emacs regexp engine. The two ranges are not contiguous on the character code level.

It's a sleeper bug that was awakened by my fixing bug#33205, so I'm to blame for not checking this.





This bug report was last modified 6 years and 95 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.