GNU bug report logs -
#39483
27.0.60; ispell ignores syntax/category tables word boundaries
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 39483 in the body.
You can then email your comments to 39483 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39483
; Package
emacs
.
(Fri, 07 Feb 2020 15:46:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Paul W. Rankin" <hello <at> paulwrankin.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 07 Feb 2020 15:46:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
It appears that the function `ispell-get-word' makes its own judgements
on word boundaries, ignoring the buffer's syntax tables and character
categories. This becomes a problem with using `electric-quote-mode' and
ispell, because contractions are parsed as separate words. e.g. Calling
`ispell-word' for "doesn’t" returns:
T is correct
To reproduce:
1. emacs -Q
2. (in *scratch*) M-x text-mode RET
3. enter text "doesn’t" (i.e. "doesn" C-x 8 ] "t")
4. M-: (modify-syntax-entry ?’ "w")
5. M-: (modify-category-entry ?’ ?^)
6. M-$ | ispell-word
Expected results:
Given the above syntax and category tables, M-f | forward-word and M-b |
backward-word now consider "doesn’t" as a single word, and so should
should be passed to the `ispell-program-name' and produce the same
result as when checked on the command line:
% echo "doesn’t" | aspell -a
@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.8)
*
% echo "doesn’t" | enchant-2 -a
@(#) International Ispell Version 3.1.20 (but really Enchant 2.2.7)
*
Actual results:
The word "doesn’t" is parsed as "t":
T is correct
Attempts at workarounds:
I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
entries from "[']" to "['’]" to no avail.
Setup:
GNU Emacs 27.0.60 (build 2, x86_64-apple-darwin19.3.0, NS appkit-1894.30
Version 10.15.3 (Build 19D76)) of 2020-02-05
--
Paul W. Rankin
https://www.paulwrankin.com
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39483
; Package
emacs
.
(Fri, 07 Feb 2020 18:25:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 39483 <at> debbugs.gnu.org (full text, mbox):
> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
> Date: Sat, 08 Feb 2020 01:44:52 +1000
>
> It appears that the function `ispell-get-word' makes its own judgements
> on word boundaries, ignoring the buffer's syntax tables and character
> categories.
That is true. And I don't really see how it can be any different,
since ispell.el must have the same notion of a word as the underlying
dictionary, otherwise you will have false positives and/or false
negatives, right?
ispell.el looks up the word characters and non-word characters in
its database, and the doc string of ispell-dictionary-base-alist
explains how.
> This becomes a problem with using `electric-quote-mode' and
> ispell, because contractions are parsed as separate words. e.g. Calling
> ispell word for "doesn’t" returns:
>
> T is correct
>
> To reproduce:
>
> 1. emacs -Q
> 2. (in *scratch*) M-x text-mode RET
> 3. enter text "doesn’t" (i.e. "doesn" C-x 8 ] "t")
> 4. M-: (modify-syntax-entry ?’ "w")
> 5. M-: (modify-category-entry ?’ ?^)
> 6. M-$ | ispell-word
The buffer syntax table has no effect on ispell.el, and shouldn't have
any effect on it.
> Attempts at workarounds:
>
> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
> entries from "[']" to "['’]" to no avail.
That's the right direction, but you didn't follow it far enough.
First, ispell-dictionary-base-alist is the default value, and is used
to produce ispell-dictionary-alist, which is one you should change
(alternatively, customize ispell-local-dictionary-alist). More
importantly, the definitions of each dictionary include more than just
one character set: there are 3 character sets there and one parameter
for encoding the string passed to the spell-checker, and you should be
sure to set them all as appropriate for the dictionary you use.
My suggestion is to step with Edebug through ispell-get-word and see
why it doesn't consider "doesn’t" as a single word in your case.
> Setup:
>
> GNU Emacs 27.0.60 (build 2, x86_64-apple-darwin19.3.0, NS appkit-1894.30
> Version 10.15.3 (Build 19D76)) of 2020-02-05
This omits crucial information, like the dictionary in use and the
locale-dependent settings that affect encoding. (In any case, I don't
think this list is the right place of discussing this issue.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39483
; Package
emacs
.
(Sat, 08 Feb 2020 05:48:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 39483 <at> debbugs.gnu.org (full text, mbox):
On Sat, Feb 08 2020, Eli Zaretskii wrote:
>> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
>> Attempts at workarounds:
>>
>> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
>> entries from "[']" to "['’]" to no avail.
>
> That's the right direction, but you didn't follow it far enough.
> First, ispell-dictionary-base-alist is the default value, and is used
> to produce ispell-dictionary-alist, which is one you should change
> (alternatively, customize ispell-local-dictionary-alist).
Thanks, that got it.
I'd discussed this on #emacs IRC and the consensus was to report. Lead
astray!!
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39483
; Package
emacs
.
(Sat, 08 Feb 2020 08:19:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 39483 <at> debbugs.gnu.org (full text, mbox):
> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
> Cc: 39483 <at> debbugs.gnu.org
> Date: Sat, 08 Feb 2020 15:47:27 +1000
>
> On Sat, Feb 08 2020, Eli Zaretskii wrote:
>
> >> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
> >> Attempts at workarounds:
> >>
> >> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
> >> entries from "[']" to "['’]" to no avail.
> >
> > That's the right direction, but you didn't follow it far enough.
> > First, ispell-dictionary-base-alist is the default value, and is used
> > to produce ispell-dictionary-alist, which is one you should change
> > (alternatively, customize ispell-local-dictionary-alist).
>
> Thanks, that got it.
I'd be interested to see your solution in full, for the record.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#39483
; Package
emacs
.
(Sat, 08 Feb 2020 09:29:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 39483 <at> debbugs.gnu.org (full text, mbox):
On Sat, Feb 08 2020, Eli Zaretskii wrote:
>> >> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
>> >> Attempts at workarounds:
>> >>
>> >> I've tried altering slot 3 of the corresponding `ispell-dictionary-base-alist'
>> >> entries from "[']" to "['’]" to no avail.
>> >
>> > That's the right direction, but you didn't follow it far enough.
>> > First, ispell-dictionary-base-alist is the default value, and is used
>> > to produce ispell-dictionary-alist, which is one you should change
>> > (alternatively, customize ispell-local-dictionary-alist).
>>
>> Thanks, that got it.
>
> I'd be interested to see your solution in full, for the record.
I went down the wrong path with syntax tables when I saw M-f/M-b was
stepping through the word like doesn|’|t| so I figured it was about word
boundaries. Searching through the manual I couldn't find anything in
"(emacs) Quotation Marks" or "(emacs) Spelling" but found the references
to syntax tables regarding word boundaries in "(elisp) Word Motion".
As it turns out it was just a case of customising
ispell-local-dictionary-alist and adding both a default and "en_US"
entry with OTHERCHARS regexp as "['’]" pretty much exactly as the
docstring on ispell-dictionary-alist says.
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Sat, 08 Feb 2020 10:07:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
"Paul W. Rankin" <hello <at> paulwrankin.com>
:
bug acknowledged by developer.
(Sat, 08 Feb 2020 10:07:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 39483-done <at> debbugs.gnu.org (full text, mbox):
> From: "Paul W. Rankin" <hello <at> paulwrankin.com>
> Cc: 39483 <at> debbugs.gnu.org
> Date: Sat, 08 Feb 2020 19:28:40 +1000
>
> As it turns out it was just a case of customising
> ispell-local-dictionary-alist and adding both a default and "en_US"
> entry with OTHERCHARS regexp as "['’]" pretty much exactly as the
> docstring on ispell-dictionary-alist says.
OK, thanks. With that, I'm closing the bug report.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 07 Mar 2020 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 184 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.