GNU bug report logs - #39483
27.0.60; ispell ignores syntax/category tables word boundaries

Previous Next

Package: emacs;

Reported by: "Paul W. Rankin" <hello <at> paulwrankin.com>

Date: Fri, 7 Feb 2020 15:46:01 UTC

Severity: normal

Found in version 27.0.60

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 39483 in the body.
You can then email your comments to 39483 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#39483; Package emacs. (Fri, 07 Feb 2020 15:46:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Paul W. Rankin" <hello <at> paulwrankin.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 07 Feb 2020 15:46:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Paul W. Rankin" <hello <at> paulwrankin.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.60; ispell ignores syntax/category tables word boundaries
Date: Sat, 08 Feb 2020 01:44:52 +1000
Hello,

It appears that the function `ispell-get-word' makes its own judgements
on word boundaries, ignoring the buffer's syntax tables and character
categories. This becomes a problem with using `electric-quote-mode' and
ispell, because contractions are parsed as separate words. e.g. Calling
`ispell-word' for "doesn’t" returns:

    T is correct

To reproduce:

1. emacs -Q
2. (in *scratch*) M-x text-mode RET
3. enter text "doesn’t" (i.e. "doesn" C-x 8 ] "t")
4. M-: (modify-syntax-entry ?’ "w")
5. M-: (modify-category-entry ?’ ?^)
6. M-$ | ispell-word

Expected results:

Given the above syntax and category tables, M-f | forward-word and M-b |
backward-word now consider "doesn’t" as a single word, and so should
should be passed to the `ispell-program-name' and produce the same
result as when checked on the command line:

    % echo "doesn’t" | aspell -a
    @(#) International Ispell Version 3.1.20 (but really Aspell 0.60.8)
    *
    % echo "doesn’t" | enchant-2 -a
    @(#) International Ispell Version 3.1.20 (but really Enchant 2.2.7)
    *

Actual results:

The word "doesn’t" is parsed as "t":
    T is correct

Attempts at workarounds:

I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
entries from "[']" to "['’]" to no avail.

Setup:

GNU Emacs 27.0.60 (build 2, x86_64-apple-darwin19.3.0, NS appkit-1894.30
Version 10.15.3 (Build 19D76)) of 2020-02-05

-- 
Paul W. Rankin
https://www.paulwrankin.com




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39483; Package emacs. (Fri, 07 Feb 2020 18:25:01 GMT) Full text and rfc822 format available.

Message #8 received at 39483 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Paul W. Rankin" <hello <at> paulwrankin.com>
Cc: 39483 <at> debbugs.gnu.org
Subject: Re: bug#39483: 27.0.60;
 ispell ignores syntax/category tables word boundaries
Date: Fri, 07 Feb 2020 20:23:33 +0200
> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
> Date: Sat, 08 Feb 2020 01:44:52 +1000
> 
> It appears that the function `ispell-get-word' makes its own judgements
> on word boundaries, ignoring the buffer's syntax tables and character
> categories.

That is true.  And I don't really see how it can be any different,
since ispell.el must have the same notion of a word as the underlying
dictionary, otherwise you will have false positives and/or false
negatives, right?

ispell.el looks up the word characters and non-word characters in
its database, and the doc string of ispell-dictionary-base-alist
explains how.

> This becomes a problem with using `electric-quote-mode' and
> ispell, because contractions are parsed as separate words. e.g. Calling
> ispell word for "doesn’t" returns:
> 
>     T is correct
> 
> To reproduce:
> 
> 1. emacs -Q
> 2. (in *scratch*) M-x text-mode RET
> 3. enter text "doesn’t" (i.e. "doesn" C-x 8 ] "t")
> 4. M-: (modify-syntax-entry ?’ "w")
> 5. M-: (modify-category-entry ?’ ?^)
> 6. M-$ | ispell-word

The buffer syntax table has no effect on ispell.el, and shouldn't have
any effect on it.

> Attempts at workarounds:
> 
> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
> entries from "[']" to "['’]" to no avail.

That's the right direction, but you didn't follow it far enough.
First, ispell-dictionary-base-alist is the default value, and is used
to produce ispell-dictionary-alist, which is one you should change
(alternatively, customize ispell-local-dictionary-alist).  More
importantly, the definitions of each dictionary include more than just
one character set: there are 3 character sets there and one parameter
for encoding the string passed to the spell-checker, and you should be
sure to set them all as appropriate for the dictionary you use.

My suggestion is to step with Edebug through ispell-get-word and see
why it doesn't consider "doesn’t" as a single word in your case.

> Setup:
> 
> GNU Emacs 27.0.60 (build 2, x86_64-apple-darwin19.3.0, NS appkit-1894.30
> Version 10.15.3 (Build 19D76)) of 2020-02-05

This omits crucial information, like the dictionary in use and the
locale-dependent settings that affect encoding.  (In any case, I don't
think this list is the right place of discussing this issue.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39483; Package emacs. (Sat, 08 Feb 2020 05:48:01 GMT) Full text and rfc822 format available.

Message #11 received at 39483 <at> debbugs.gnu.org (full text, mbox):

From: "Paul W. Rankin" <hello <at> paulwrankin.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39483 <at> debbugs.gnu.org
Subject: Re: bug#39483: 27.0.60; ispell ignores syntax/category tables word
 boundaries
Date: Sat, 08 Feb 2020 15:47:27 +1000
On Sat, Feb 08 2020, Eli Zaretskii wrote:

>> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
>> Attempts at workarounds:
>> 
>> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
>> entries from "[']" to "['’]" to no avail.
>
> That's the right direction, but you didn't follow it far enough.
> First, ispell-dictionary-base-alist is the default value, and is used
> to produce ispell-dictionary-alist, which is one you should change
> (alternatively, customize ispell-local-dictionary-alist).

Thanks, that got it.

I'd discussed this on #emacs IRC and the consensus was to report. Lead
astray!!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39483; Package emacs. (Sat, 08 Feb 2020 08:19:01 GMT) Full text and rfc822 format available.

Message #14 received at 39483 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Paul W. Rankin" <hello <at> paulwrankin.com>
Cc: 39483 <at> debbugs.gnu.org
Subject: Re: bug#39483: 27.0.60; ispell ignores syntax/category tables word
 boundaries
Date: Sat, 08 Feb 2020 10:18:20 +0200
> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
> Cc: 39483 <at> debbugs.gnu.org
> Date: Sat, 08 Feb 2020 15:47:27 +1000
> 
> On Sat, Feb 08 2020, Eli Zaretskii wrote:
> 
> >> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
> >> Attempts at workarounds:
> >> 
> >> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
> >> entries from "[']" to "['’]" to no avail.
> >
> > That's the right direction, but you didn't follow it far enough.
> > First, ispell-dictionary-base-alist is the default value, and is used
> > to produce ispell-dictionary-alist, which is one you should change
> > (alternatively, customize ispell-local-dictionary-alist).
> 
> Thanks, that got it.

I'd be interested to see your solution in full, for the record.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39483; Package emacs. (Sat, 08 Feb 2020 09:29:02 GMT) Full text and rfc822 format available.

Message #17 received at 39483 <at> debbugs.gnu.org (full text, mbox):

From: "Paul W. Rankin" <hello <at> paulwrankin.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 39483 <at> debbugs.gnu.org
Subject: Re: bug#39483: 27.0.60; ispell ignores syntax/category tables word
 boundaries
Date: Sat, 08 Feb 2020 19:28:40 +1000
On Sat, Feb 08 2020, Eli Zaretskii wrote:
>> >> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
>> >> Attempts at workarounds:
>> >>
>> >> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
>> >> entries from "[']" to "['’]" to no avail.
>> >
>> > That's the right direction, but you didn't follow it far enough.
>> > First, ispell-dictionary-base-alist is the default value, and is used
>> > to produce ispell-dictionary-alist, which is one you should change
>> > (alternatively, customize ispell-local-dictionary-alist).
>>
>> Thanks, that got it.
>
> I'd be interested to see your solution in full, for the record.

I went down the wrong path with syntax tables when I saw M-f/M-b was
stepping through the word like doesn|’|t| so I figured it was about word
boundaries. Searching through the manual I couldn't find anything in
"(emacs) Quotation Marks" or "(emacs) Spelling" but found the references
to syntax tables regarding word boundaries in "(elisp) Word Motion".

As it turns out it was just a case of customising
ispell-local-dictionary-alist and adding both a default and "en_US"
entry with OTHERCHARS regexp as "['’]" pretty much exactly as the
docstring on ispell-dictionary-alist says.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sat, 08 Feb 2020 10:07:01 GMT) Full text and rfc822 format available.

Notification sent to "Paul W. Rankin" <hello <at> paulwrankin.com>:
bug acknowledged by developer. (Sat, 08 Feb 2020 10:07:02 GMT) Full text and rfc822 format available.

Message #22 received at 39483-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Paul W. Rankin" <hello <at> paulwrankin.com>
Cc: 39483-done <at> debbugs.gnu.org
Subject: Re: bug#39483: 27.0.60; ispell ignores syntax/category tables word
 boundaries
Date: Sat, 08 Feb 2020 12:06:25 +0200
> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
> Cc: 39483 <at> debbugs.gnu.org
> Date: Sat, 08 Feb 2020 19:28:40 +1000
> 
> As it turns out it was just a case of customising
> ispell-local-dictionary-alist and adding both a default and "en_US"
> entry with OTHERCHARS regexp as "['’]" pretty much exactly as the
> docstring on ispell-dictionary-alist says.

OK, thanks.  With that, I'm closing the bug report.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 07 Mar 2020 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 184 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.