#7668 - ispell and dictionary encodings

GNU bug report logs - #7668
ispell and dictionary encodings

Package: emacs;

Reported by: Reuben Thomas <rrt <at> sc3d.org>

Date: Fri, 17 Dec 2010 18:25:01 UTC

Severity: normal

View this message in rfc822 format

From: Agustin Martin <agustin.martin <at> hispalinux.es> To: Reuben Thomas <rrt <at> sc3d.org>, 7668 <at> debbugs.gnu.org Subject: bug#7668: ispell and dictionary encodings Date: Mon, 20 Dec 2010 12:31:48 +0100

On Fri, Dec 17, 2010 at 06:30:14PM +0000, Reuben Thomas wrote: > I've just been puzzling my way through ispell.gz's dictionary encoding > code, after switching from aspell to hunspell in order to be able to > treat Unicode curly single quotes as normal intraword punctuation > (which it seems aspell cannot be persuaded to do, but that's another > story). > > I noticed a feature of ispell-dictionary-base-alist, which I don't > understand: the last (7th) element of each dictionary definition is > called "Coding System", which seems to be the coding system of the > case character and non-case-character strings, but it is also passed > to the spelling program as the input encoding, which is wrong, since > the input encoding depends on the file to be checked. That element represents the language that will be used for communication with the dictionary. case-character and non-case-character strings should be in the same encoding as it. > I currently use the classic workaround of making up my own dictionary > definition which includes accented characters that I want to be able > to use in words (which is necessary anyway), and which specifies utf-8 > as the coding system. This only works because I use utf-8 for all my > text files. If you are not going to use XEmacs, but only FSF Emacs, just use [:alpha:] for the case-character and non-case-character strings along with utf-8. That is already done automatically for aspell dictionaries, where is easy to get a list of installed dictionaries and additional info. > It seems, therefore, that the argument to follow > ispell-encoding8-command (which itself is mis-documented: > > Command line option prefix to select UTF-8 if supported, nil otherwise. > If UTF-8 if supported by spellchecker and is selectable from the command line > this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell, > so UTF-8 or other mime charsets can be selected. That will be set for hunspell > >=1.1.6 or aspell >= 0.60 in `ispell-check-version'. > > It is not just for selecting UTF-8; indeed, that's the irony: in the > default configuration it's used mostly to select 8-bit character sets! > And there are one or two other typos. How about (suitably rewrapped): > > Command line option prefix to select coding system if supported, nil otherwise. > If the coding system is selectable from the command line > this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell, > so that the input encoding can be selected. That will be set for hunspell > >= 1.1.6 or aspell >= 0.60 in `ispell-check-version'. Agreed, thanks > Then, the following code in ispell-start-process: > > ;; If we are using recent aspell or hunspell, make sure we use the > right encoding > ;; for communication. ispell or older aspell/hunspell does not support this > (if ispell-encoding8-command > (setq args > (append args > (list > (concat ispell-encoding8-command > (symbol-name (ispell-get-coding-system))))))) > > needs fixing: rather than using ispell-get-coding-system, it should > use a prefix of buffer-file-coding-system (without the suffix that > specifies the line ending). No, current code is correct. It is telling the spellchecker that communication with the dictionary will be done in (ispell-get-coding-system) coding system. ispell.el will do the internal conversions needed for that in a diferent place, so everything is transparent to the user. > I'm sure I'm missing things here, but if what I've said above makes > any sense, I'd like to help refine it into a sensible proposal to > improve ispell.el. Thanks for looking into this. Will prepare a change with the `ispell-encoding8-command' documentation fix. Regards, -- Agustin

This bug report was last modified 14 years and 237 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #7668 ispell and dictionary encodings

GNU bug report logs - #7668
ispell and dictionary encodings