GNU bug report logs -
#19878
24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter
Previous Next
Reported by: mohammad.mahmoudi <at> gmail.com
Date: Sun, 15 Feb 2015 19:25:02 UTC
Severity: normal
Found in version 24.4
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#19878: 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter
which was filed against the emacs package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 19878 <at> debbugs.gnu.org.
--
19878: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19878
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
> Date: Tue, 17 Feb 2015 18:13:05 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: mohammad.mahmoudi <at> gmail.com, 19878 <at> debbugs.gnu.org
>
> > From: Andreas Politz <politza <at> hochschule-trier.de>
> > Date: Sun, 15 Feb 2015 21:16:13 +0100
> > Cc: 19878 <at> debbugs.gnu.org
> >
> >
> > I think this is supposed to be:
> >
> > ,----[ (info "(elisp) Char Classes") ]
> > | `[:alpha:]'
> > | This matches any letter. (At present, for multibyte characters, it
> > | matches anything that has word syntax.)
> > `----
>
> Indeed, which doesn't sound very nice.
>
> Does someone object to the changes below (to be installed on master)?
> They make [:alpha:] and [:alnum:] closer to the Unicode
> recommendations in UTS #18, although we are still very far from
> supporting even Level 1 of conformance. But these two seem like
> low-hanging fruit to me.
>
> The modified definitions of these two sets are not 100% compatible
> with the old ones for the multibyte characters. However, if it turns
> out that some code used these to get word-constituent characters,
> those places should simply be changed to use \sw instead.
No further comments, so I pushed the changes as commit 1a50945 on the
master branch, and I'm marking this bug closed.
> Also, does someone see any potential problem to make [:digit:] be a
> superset of the current ASCII-only set, to match UTS #18 as well? The
> comment in regex.c says it is "only used for single-byte characters",
> but it isn't clear to me whether this is a requirement, i.e. there's
> some code in Emacs that relies on that, or just a statement of facts.
I'd still like to hear an answer and/or opinions about this. If I
hear no comments, I will look into making a similar change to
[:digit:] soon.
[Message part 3 (message/rfc822, inline)]
This is to report that the Syntax class [:alpha:] wrongly matches the
Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter.
In GNU Emacs 24.4.1 (i686-pc-mingw32)
of 2014-10-24 on LEG570
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
`configure --prefix=/c/usr'
Important settings:
value of $LANG: ENU
locale-coding-system: cp1256
This bug report was last modified 10 years and 137 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.