GNU bug report logs -
#17437
24.3; ispell uses typographically correct apostrophe as word boundary
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17437 in the body.
You can then email your comments to 17437 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#17437
; Package
emacs
.
(Thu, 08 May 2014 16:04:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Tobias Getzner" <tobias.getzner <at> gmx.de>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 08 May 2014 16:04:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
When using the typographically correct apostrophe (“right single
quotation mark” U+2019), ispell will mark-up parts of words as typos.
E.g., in “doesn’t”, the part before the apostrophe will be highlighted
as a typo even if the spell-checker supports the apostrophe.
This bug occurs irrespective of the spell-checker, so I suppose that
ispell does its own tokenization and uses the apostrophe as a word
boundary. Instead, the apostrophe should correctly be treated as
word-internal punctuation and handed on to the actual spell-checker
program.
Best regards,
Tobias
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#17437
; Package
emacs
.
(Thu, 08 May 2014 17:39:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 17437 <at> debbugs.gnu.org (full text, mbox):
On Thu, May 08, 2014 at 02:15:17PM +0200, Tobias Getzner wrote:
>
> When using the typographically correct apostrophe (“right single
> quotation mark” U+2019), ispell will mark-up parts of words as typos.
> E.g., in “doesn’t”, the part before the apostrophe will be highlighted
> as a typo even if the spell-checker supports the apostrophe.
>
> This bug occurs irrespective of the spell-checker, so I suppose that
> ispell does its own tokenization and uses the apostrophe as a word
> boundary. Instead, the apostrophe should correctly be treated as
> word-internal punctuation and handed on to the actual spell-checker
> program.
Which language are you using? Whether the apostrophe is or not a wordchar
depends on the language. By the way, "doesn't" is working well here with
aspell+american.
--
Agustin
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#17437
; Package
emacs
.
(Fri, 09 May 2014 05:08:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 17437 <at> debbugs.gnu.org (full text, mbox):
Hello Agustin,
> Which language are you using? Whether the apostrophe is or not a
> wordchar depends on the language. By the way, "doesn't" is working
> well here with aspell+american.
Please note that the bug is not about the single quote apostrophe,
U+0027, but concerns the typographically correct apostrophe, U+2019.
Both hunspell and aspell support it in recent versions, but Emacs
fails to correctly hand over words containing the typographical
apostrophe.
Regards,
Tobias
Added tag(s) moreinfo.
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Mon, 19 May 2014 06:25:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#17437
; Package
emacs
.
(Tue, 22 Jul 2014 09:43:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 17437 <at> debbugs.gnu.org (full text, mbox):
> More details are needed here, e.g., what language is the reporter using?
I suppose you are referring to the selected ispell dictionary? I am not
explicitly setting the ispell dictionary, so (I presume) ispell.el will
not pass a dictionary to hunspell, which will accordingly use the
default one for my locale, i. e., en_US. While some dictionaries have
issues with U+2019 (and in fact most still are encoded in latin-1 :-/),
I have added this character to WORDCHARS in my hunspell en_US
dictionary; hunspell now correctly recognize words using this character
when invoking hunspell in a terminal. Sadly, it seems ispell.el sadly
still won’t handle these, however.
The problem seems to be that ispell.el still thinks that U+2019 is a
word boundary and doesn’t pass the whole word on to the spell checker.
Is this likely? Looking at ispell.el, it looks like it is doing word
boundary parsing on its own‽ If so, U+2019 should be treated as a
word-character when it appears in the context of two alphabetical
characters (at least for most western languages).
Best regards,
Tobias
Removed tag(s) moreinfo.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Sat, 26 Dec 2015 15:20:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Alan Third <alan <at> idiocy.org>
:
You have taken responsibility.
(Fri, 06 Dec 2019 11:00:03 GMT)
Full text and
rfc822 format available.
Notification sent
to
"Tobias Getzner" <tobias.getzner <at> gmx.de>
:
bug acknowledged by developer.
(Fri, 06 Dec 2019 11:00:03 GMT)
Full text and
rfc822 format available.
Message #23 received at 17437-done <at> debbugs.gnu.org (full text, mbox):
Tobias Getzner <tobias.getzner <at> gmx.de> writes:
> I have added this character to WORDCHARS in my hunspell en_US
> dictionary; hunspell now correctly recognize words using this character
> when invoking hunspell in a terminal. Sadly, it seems ispell.el sadly
> still won’t handle these, however.
I believe this has been fixed since this bug report was raised. Emacs is
now able to scan hunspell's dictionary files and make use of WORDCHARS.
I can't remember if it's fixed in Emacs 25, but it's definitely fixed in
26.
I'll close this bug report, but if you're still experiencing the problem
please reply and we can reopen it.
--
Alan Third
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 03 Jan 2020 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 165 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.