GNU bug report logs - #19878
24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter

Previous Next

Package: emacs;

Reported by: mohammad.mahmoudi <at> gmail.com

Date: Sun, 15 Feb 2015 19:25:02 UTC

Severity: normal

Found in version 24.4

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: mohammad.mahmoudi <at> gmail.com
Subject: bug#19878: closed (Re: bug#19878: 24.4; Syntax class [:alpha:]
 wrongly matches the Indian	digits ۱۲۳۴۵۶۷۸۹۰
 as letter)
Date: Sat, 28 Feb 2015 12:31:03 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#19878: 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter

which was filed against the emacs package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 19878 <at> debbugs.gnu.org.

-- 
19878: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=19878
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Eli Zaretskii <eliz <at> gnu.org>
To: politza <at> hochschule-trier.de, mohammad.mahmoudi <at> gmail.com
Cc: 19878-done <at> debbugs.gnu.org
Subject: Re: bug#19878: 24.4;
 Syntax class [:alpha:] wrongly matches the Indian	digits
 ۱۲۳۴۵۶۷۸۹۰ as letter
Date: Sat, 28 Feb 2015 14:29:52 +0200
> Date: Tue, 17 Feb 2015 18:13:05 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: mohammad.mahmoudi <at> gmail.com, 19878 <at> debbugs.gnu.org
> 
> > From: Andreas Politz <politza <at> hochschule-trier.de>
> > Date: Sun, 15 Feb 2015 21:16:13 +0100
> > Cc: 19878 <at> debbugs.gnu.org
> > 
> > 
> > I think this is supposed to be:
> > 
> > ,----[ (info "(elisp) Char Classes") ]
> > | `[:alpha:]'
> > |      This matches any letter.  (At present, for multibyte characters, it
> > |      matches anything that has word syntax.)
> > `----
> 
> Indeed, which doesn't sound very nice.
> 
> Does someone object to the changes below (to be installed on master)?
> They make [:alpha:] and [:alnum:] closer to the Unicode
> recommendations in UTS #18, although we are still very far from
> supporting even Level 1 of conformance.  But these two seem like
> low-hanging fruit to me.
> 
> The modified definitions of these two sets are not 100% compatible
> with the old ones for the multibyte characters.  However, if it turns
> out that some code used these to get word-constituent characters,
> those places should simply be changed to use \sw instead.

No further comments, so I pushed the changes as commit 1a50945 on the
master branch, and I'm marking this bug closed.

> Also, does someone see any potential problem to make [:digit:] be a
> superset of the current ASCII-only set, to match UTS #18 as well?  The
> comment in regex.c says it is "only used for single-byte characters",
> but it isn't clear to me whether this is a requirement, i.e. there's
> some code in Emacs that relies on that, or just a statement of facts.

I'd still like to hear an answer and/or opinions about this.  If I
hear no comments, I will look into making a similar change to
[:digit:] soon.

[Message part 3 (message/rfc822, inline)]
From: mohammad.mahmoudi <at> gmail.com
To: bug-gnu-emacs <at> gnu.org
Subject: 24.4; Syntax class [:alpha:] wrongly matches the Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter
Date: Sun, 15 Feb 2015 19:14:57 +0330 (Iran Standard Time)
This is to report that the Syntax class [:alpha:] wrongly matches the 
Indian digits ۱۲۳۴۵۶۷۸۹۰ as letter.


In GNU Emacs 24.4.1 (i686-pc-mingw32)
 of 2014-10-24 on LEG570
Windowing system distributor `Microsoft Corp.', version 6.1.7601
 Configured using:
 `configure --prefix=/c/usr'

 Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1256



This bug report was last modified 10 years and 137 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.