GNU bug report logs - #25230
Patch to ispell.el to simplify use of [:alpha:] for CASECHARS in built-in dictionaries

Previous Next

Package: emacs;

Reported by: Reuben Thomas <rrt <at> sc3d.org>

Date: Mon, 19 Dec 2016 12:30:02 UTC

Severity: wishlist

Tags: fixed

Fixed in version 27.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 25230 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Reuben Thomas <rrt <at> sc3d.org>
Cc: 25230 <at> debbugs.gnu.org
Subject: Re: bug#25230: Patch to ispell.el to simplify use of [:alpha:] for
 CASECHARS in built-in dictionaries
Date: Mon, 19 Dec 2016 18:23:26 +0200
> From: Reuben Thomas <rrt <at> sc3d.org>
> Date: Mon, 19 Dec 2016 12:28:57 +0000
> 
> In ispell-set-spellchecker-params, there is code that used to be run conditionally on support for POSIX
> character classes, which sets all the CASECHARS and NOT-CASECHARS entries for built-in dictionaries to
> [[:alpha:]] and [^[:alpha:]] respectively.
> 
> There is no point doing this unconditionally, so instead, put these character classes directly into the initial
> values used in ispell-dictionary-base-alist. This change also makes the variable's initialization easier to read.
> 
> The attached patch makes these changes.
> 
> -     "[A-Za-z]" "[^A-Za-z]" "[']" nil ("-B") nil iso-8859-1)
> +     ;; just use a minimal regexp.
> +     "[[:alpha:]]" "[^[:alpha:]]" "[']" nil ("-B") nil iso-8859-1)

You are assuming that [[:alpha:]] and [A-Za-z] are identical.  But
they are far from being identical, not since Emacs 25.1.  I mentioned
this in another thread today.

>      ("brasileiro"			; Brazilian mode
> -     "[A-Z\301\311\315\323\332\300\310\314\322\331\303\325\307\334\302\312\324a-z\341\351\355\363\372\340\350\354\362\371\343\365\347\374\342\352\364]"
> -     "[^A-Z\301\311\315\323\332\300\310\314\322\331\303\325\307\334\302\312\324a-z\341\351\355\363\372\340\350\354\362\371\343\365\347\374\342\352\364]"
> -     "[']" nil nil nil iso-8859-1)
> +     "[[:alpha:]]" "[^[:alpha:]]" "[']" nil nil nil iso-8859-1)

Same here: [[:alpha:]] is much broader now than any set of characters
supported by a single language.

In any case, these settings are for Ispell, which only supports
single-byte encodings.  We cannot use arbitrary characters with it.

IOW, I don't think this patch is in the right direction.

Thanks.




This bug report was last modified 5 years and 333 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.