#19878 - 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter

GNU bug report logs - #19878
24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter

Package: emacs;

Reported by: mohammad.mahmoudi <at> gmail.com

Date: Sun, 15 Feb 2015 19:25:02 UTC

Severity: normal

Found in version 24.4

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Message #22 received at 19878-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org> To: politza <at> hochschule-trier.de, mohammad.mahmoudi <at> gmail.com Cc: 19878-done <at> debbugs.gnu.org Subject: Re: bug#19878: 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter Date: Sat, 28 Feb 2015 14:29:52 +0200

> Date: Tue, 17 Feb 2015 18:13:05 +0200 > From: Eli Zaretskii <eliz <at> gnu.org> > Cc: mohammad.mahmoudi <at> gmail.com, 19878 <at> debbugs.gnu.org > > > From: Andreas Politz <politza <at> hochschule-trier.de> > > Date: Sun, 15 Feb 2015 21:16:13 +0100 > > Cc: 19878 <at> debbugs.gnu.org > > > > > > I think this is supposed to be: > > > > ,----[ (info "(elisp) Char Classes") ] > > | `[:alpha:]' > > | This matches any letter. (At present, for multibyte characters, it > > | matches anything that has word syntax.) > > `---- > > Indeed, which doesn't sound very nice. > > Does someone object to the changes below (to be installed on master)? > They make [:alpha:] and [:alnum:] closer to the Unicode > recommendations in UTS #18, although we are still very far from > supporting even Level 1 of conformance. But these two seem like > low-hanging fruit to me. > > The modified definitions of these two sets are not 100% compatible > with the old ones for the multibyte characters. However, if it turns > out that some code used these to get word-constituent characters, > those places should simply be changed to use \sw instead. No further comments, so I pushed the changes as commit 1a50945 on the master branch, and I'm marking this bug closed. > Also, does someone see any potential problem to make [:digit:] be a > superset of the current ASCII-only set, to match UTS #18 as well? The > comment in regex.c says it is "only used for single-byte characters", > but it isn't clear to me whether this is a requirement, i.e. there's > some code in Emacs that relies on that, or just a statement of facts. I'd still like to hear an answer and/or opinions about this. If I hear no comments, I will look into making a similar change to [:digit:] soon.

This bug report was last modified 10 years and 142 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #19878 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter

GNU bug report logs - #19878
24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter