#19653 - ispell misalignment with hunspell when Unicode apostrophe is used

GNU bug report logs - #19653
ispell misalignment with hunspell when Unicode apostrophe is used

Package: emacs;

Reported by: Tobias Getzner <tobias.getzner <at> gmx.de>

Date: Thu, 22 Jan 2015 14:41:02 UTC

Severity: normal

Tags: moreinfo

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Message #23 received at 19653 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org> To: Joseph Mingrone <jrm <at> ftfl.ca> Cc: 19653 <at> debbugs.gnu.org Subject: Re: bug#19653: ispell misalignment with hunspell when Unicode apostrophe is used Date: Fri, 21 Oct 2016 10:33:10 +0300

> From: Joseph Mingrone <jrm <at> ftfl.ca> > Date: Fri, 21 Oct 2016 02:04:58 -0300 > > This still seems to be a problem with hunspell version 1.3.3. > > The problem can be reproduced by spell checking a file with this one line. > > alsdk ✅ sdfkjdsf sldksdfkjsfd > > During spell checking, the process list shows: > > ispell run -- -- /usr/local/bin/hunspell -a -d en_CA -i UTF-8 > > The error Emacs (version 25.1.1) reports is: > > ispell-process-line: Ispell misalignment: word ‘sdfkjdsf’ point 11; probably incompatible versions Did Hunspell ever fix the problem whereby it reported byte offsets of the misspelled words, as opposed to character offsets? If not, that is your problem, and Hunspell should finally get its act together. To see whether this is the problem, invoke Hunspell like this: /usr/local/bin/hunspell -a -d en_CA -i UTF-8 < test.txt and see what Hunspell emits. It should emit something like this (the below is taken from my system, and I don't have the en_CA dictionary, so your output might be slightly different): @(#) International Ispell Version 3.2.06 (but really Hunspell 1.3.2) & alsdk 3 0: Alaska, elastic, Alston & sdfkjdsf 2 8: artefact's, postfix & sldksdfkjsfd 2 17: justification, staphylococcus The second number after each misspelled word is the offset of that word's beginning, measured in characters, from the start of the line. Hunspell used to report this in bytes instead of characters; if it still does, you will have to patch it to fix that bug. AFAIR, the Hunspell issue tracker includes several patches for this bug. Or maybe the latest Hunspell 1.4.1 already fixes this, in which case please upgrade.

This bug report was last modified 8 years and 215 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #19653 ispell misalignment with hunspell when Unicode apostrophe is used

GNU bug report logs - #19653
ispell misalignment with hunspell when Unicode apostrophe is used