GNU bug report logs - #19878
24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter

Previous Next

Package: emacs;

Reported by: mohammad.mahmoudi <at> gmail.com

Date: Sun, 15 Feb 2015 19:25:02 UTC

Severity: normal

Found in version 24.4

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log

View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#19878: closed (24.4; Syntax class [:alpha:] wrongly matches
 the Indian digits ۱۲۳۴۵۶۷۸۹۰
 as letter)
Date: Sat, 28 Feb 2015 12:31:02 +0000

[Message part 1 (text/plain, inline)]

Your message dated Sat, 28 Feb 2015 14:29:52 +0200
with message-id <83bnkete0v.fsf <at> gnu.org>
and subject line Re: bug#19878: 24.4; Syntax class [:alpha:] wrongly matches the Indian	digits ۱۲۳۴۵۶۷۸۹۰ as letter
has caused the debbugs.gnu.org bug report #19878,
regarding 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
19878: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19878
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems

[Message part 2 (message/rfc822, inline)]

From: mohammad.mahmoudi <at> gmail.com
To: bug-gnu-emacs <at> gnu.org
Subject: 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter
Date: Sun, 15 Feb 2015 19:14:57 +0330 (Iran Standard Time)

This is to report that the Syntax class [:alpha:] wrongly matches the 
Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter.


In GNU Emacs 24.4.1 (i686-pc-mingw32)
 of 2014-10-24 on LEG570
Windowing system distributor `Microsoft Corp.', version 6.1.7601
 Configured using:
 `configure --prefix=/c/usr'

 Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1256

[Message part 3 (message/rfc822, inline)]

From: Eli Zaretskii <eliz <at> gnu.org>
To: politza <at> hochschule-trier.de, mohammad.mahmoudi <at> gmail.com
Cc: 19878-done <at> debbugs.gnu.org
Subject: Re: bug#19878: 24.4;
 Syntax class [:alpha:] wrongly matches the Indian	digits
 ۱۲۳۴۵۶۷۸۹۰ as letter
Date: Sat, 28 Feb 2015 14:29:52 +0200

> Date: Tue, 17 Feb 2015 18:13:05 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: mohammad.mahmoudi <at> gmail.com, 19878 <at> debbugs.gnu.org
> 
> > From: Andreas Politz <politza <at> hochschule-trier.de>
> > Date: Sun, 15 Feb 2015 21:16:13 +0100
> > Cc: 19878 <at> debbugs.gnu.org
> > 
> > 
> > I think this is supposed to be:
> > 
> > ,----[ (info "(elisp) Char Classes") ]
> > | `[:alpha:]'
> > |      This matches any letter.  (At present, for multibyte characters, it
> > |      matches anything that has word syntax.)
> > `----
> 
> Indeed, which doesn't sound very nice.
> 
> Does someone object to the changes below (to be installed on master)?
> They make [:alpha:] and [:alnum:] closer to the Unicode
> recommendations in UTS #18, although we are still very far from
> supporting even Level 1 of conformance.  But these two seem like
> low-hanging fruit to me.
> 
> The modified definitions of these two sets are not 100% compatible
> with the old ones for the multibyte characters.  However, if it turns
> out that some code used these to get word-constituent characters,
> those places should simply be changed to use \sw instead.

No further comments, so I pushed the changes as commit 1a50945 on the
master branch, and I'm marking this bug closed.

> Also, does someone see any potential problem to make [:digit:] be a
> superset of the current ASCII-only set, to match UTS #18 as well?  The
> comment in regex.c says it is "only used for single-byte characters",
> but it isn't clear to me whether this is a requirement, i.e. there's
> some code in Emacs that relies on that, or just a statement of facts.

I'd still like to hear an answer and/or opinions about this.  If I
hear no comments, I will look into making a similar change to
[:digit:] soon.

This bug report was last modified 10 years and 137 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #19878 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter

GNU bug report logs - #19878
24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter