GNU bug report logs - #17437
24.3; ispell uses typographically correct apostrophe as word boundary

Previous Next

Package: emacs;

Reported by: "Tobias Getzner" <tobias.getzner <at> gmx.de>

Date: Thu, 8 May 2014 16:04:01 UTC

Severity: normal

Found in version 24.3

Done: Alan Third <alan <at> idiocy.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17437 in the body.
You can then email your comments to 17437 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#17437; Package emacs. (Thu, 08 May 2014 16:04:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Tobias Getzner" <tobias.getzner <at> gmx.de>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 08 May 2014 16:04:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Tobias Getzner" <tobias.getzner <at> gmx.de>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3; ispell uses typographically correct apostrophe as word boundary
Date: Thu, 8 May 2014 14:15:17 +0200
When using the typographically correct apostrophe (“right single
quotation mark” U+2019), ispell will mark-up parts of words as typos.
E.g., in “doesn’t”, the part before the apostrophe will be highlighted
as a typo even if the spell-checker supports the apostrophe.

This bug occurs irrespective of the spell-checker, so I suppose that
ispell does its own tokenization and uses the apostrophe as a word
boundary. Instead, the apostrophe should correctly be treated as
word-internal punctuation and handed on to the actual spell-checker
program.

Best regards,
Tobias
 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17437; Package emacs. (Thu, 08 May 2014 17:39:02 GMT) Full text and rfc822 format available.

Message #8 received at 17437 <at> debbugs.gnu.org (full text, mbox):

From: Agustin Martin <agustin.martin <at> hispalinux.es>
To: Tobias Getzner <tobias.getzner <at> gmx.de>, 17437 <at> debbugs.gnu.org
Subject: Re: bug#17437: 24.3; ispell uses typographically correct apostrophe
 as word boundary
Date: Thu, 8 May 2014 19:38:44 +0200
On Thu, May 08, 2014 at 02:15:17PM +0200, Tobias Getzner wrote:
> 
> When using the typographically correct apostrophe (“right single
> quotation mark” U+2019), ispell will mark-up parts of words as typos.
> E.g., in “doesn’t”, the part before the apostrophe will be highlighted
> as a typo even if the spell-checker supports the apostrophe.
> 
> This bug occurs irrespective of the spell-checker, so I suppose that
> ispell does its own tokenization and uses the apostrophe as a word
> boundary. Instead, the apostrophe should correctly be treated as
> word-internal punctuation and handed on to the actual spell-checker
> program.

Which language are you using? Whether the apostrophe is or not a wordchar
depends on the language. By the way, "doesn't" is working well here with
aspell+american.

-- 
Agustin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17437; Package emacs. (Fri, 09 May 2014 05:08:01 GMT) Full text and rfc822 format available.

Message #11 received at 17437 <at> debbugs.gnu.org (full text, mbox):

From: "Tobias Getzner" <tobias.getzner <at> gmx.de>
To: "Agustin Martin" <agustin.martin <at> hispalinux.es>
Cc: 17437 <at> debbugs.gnu.org
Subject: Re: bug#17437: 24.3; ispell uses typographically correct apostrophe
 as word boundary
Date: Fri, 9 May 2014 07:07:11 +0200
Hello Agustin,

> Which language are you using? Whether the apostrophe is or not a
> wordchar depends on the language. By the way, "doesn't" is working
> well here with aspell+american.

Please note that the bug is not about the single quote apostrophe,
U+0027, but concerns the typographically correct apostrophe, U+2019.

Both hunspell and aspell support it in recent versions, but Emacs
fails to correctly hand over words containing the typographical
apostrophe.

Regards,
Tobias




Added tag(s) moreinfo. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Mon, 19 May 2014 06:25:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17437; Package emacs. (Tue, 22 Jul 2014 09:43:02 GMT) Full text and rfc822 format available.

Message #16 received at 17437 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Getzner <tobias.getzner <at> gmx.de>
To: 17437 <at> debbugs.gnu.org
Subject: Re: bug#17437: 24.3; ispell uses typographically correct apostrophe
Date: Tue, 22 Jul 2014 11:42:21 +0200
> More details are needed here, e.g., what language is the reporter using?

I suppose you are referring to the selected ispell dictionary? I am not
explicitly setting the ispell dictionary, so (I presume) ispell.el will
not pass a dictionary to hunspell, which will accordingly use the
default one for my locale, i. e., en_US. While some dictionaries have
issues with U+2019 (and in fact most still are encoded in latin-1 :-/),
I have added this character to WORDCHARS in my hunspell en_US
dictionary; hunspell now correctly recognize words using this character
when invoking hunspell in a terminal. Sadly, it seems ispell.el sadly
still won’t handle these, however.

The problem seems to be that ispell.el still thinks that U+2019 is a
word boundary and doesn’t pass the whole word on to the spell checker.
Is this likely? Looking at ispell.el, it looks like it is doing word
boundary parsing on its own‽ If so, U+2019 should be treated as a
word-character when it appears in the context of two alphabetical
characters (at least for most western languages).

Best regards,
Tobias






Removed tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sat, 26 Dec 2015 15:20:02 GMT) Full text and rfc822 format available.

Reply sent to Alan Third <alan <at> idiocy.org>:
You have taken responsibility. (Fri, 06 Dec 2019 11:00:03 GMT) Full text and rfc822 format available.

Notification sent to "Tobias Getzner" <tobias.getzner <at> gmx.de>:
bug acknowledged by developer. (Fri, 06 Dec 2019 11:00:03 GMT) Full text and rfc822 format available.

Message #23 received at 17437-done <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Tobias Getzner <tobias.getzner <at> gmx.de>
Cc: 17437-done <at> debbugs.gnu.org
Subject: Re: bug#17437: 24.3; ispell uses typographically correct apostrophe
Date: Fri, 06 Dec 2019 10:59:40 +0000
Tobias Getzner <tobias.getzner <at> gmx.de> writes:

> I have added this character to WORDCHARS in my hunspell en_US
> dictionary; hunspell now correctly recognize words using this character
> when invoking hunspell in a terminal. Sadly, it seems ispell.el sadly
> still won’t handle these, however.

I believe this has been fixed since this bug report was raised. Emacs is
now able to scan hunspell's dictionary files and make use of WORDCHARS.
I can't remember if it's fixed in Emacs 25, but it's definitely fixed in
26.

I'll close this bug report, but if you're still experiencing the problem
please reply and we can reopen it.
-- 
Alan Third




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 03 Jan 2020 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 165 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.