GNU bug report logs - #7668
ispell and dictionary encodings

Previous Next

Package: emacs;

Reported by: Reuben Thomas <rrt <at> sc3d.org>

Date: Fri, 17 Dec 2010 18:25:01 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Agustin Martin <agustin.martin <at> hispalinux.es>
To: Reuben Thomas <rrt <at> sc3d.org>, 7668 <at> debbugs.gnu.org
Subject: bug#7668: ispell and dictionary encodings
Date: Mon, 20 Dec 2010 12:31:48 +0100
On Fri, Dec 17, 2010 at 06:30:14PM +0000, Reuben Thomas wrote:
> I've just been puzzling my way through ispell.gz's dictionary encoding
> code, after switching from aspell to hunspell in order to be able to
> treat Unicode curly single quotes as normal intraword punctuation
> (which it seems aspell cannot be persuaded to do, but that's another
> story).
> 
> I noticed a feature of ispell-dictionary-base-alist, which I don't
> understand: the last (7th) element of each dictionary definition is
> called "Coding System", which seems to be the coding system of the
> case character and non-case-character strings, but it is also passed
> to the spelling program as the input encoding, which is wrong, since
> the input encoding depends on the file to be checked.

That element represents the language that will be used for communication
with the dictionary. case-character and non-case-character strings should 
be in the same encoding as it.

> I currently use the classic workaround of making up my own dictionary
> definition which includes accented characters that I want to be able
> to use in words (which is necessary anyway), and which specifies utf-8
> as the coding system. This only works because I use utf-8 for all my
> text files.

If you are not going to use XEmacs, but only FSF Emacs, just use [:alpha:]
for the case-character and non-case-character strings along with utf-8. That
is already done automatically for aspell dictionaries, where is easy to get
a list of installed dictionaries and additional info.

> It seems, therefore, that the argument to follow
> ispell-encoding8-command (which itself is mis-documented:
> 
> Command line option prefix to select UTF-8 if supported, nil otherwise.
> If UTF-8 if supported by spellchecker and is selectable from the command line
> this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell,
> so UTF-8 or other mime charsets can be selected.  That will be set for hunspell
> >=1.1.6 or aspell >= 0.60 in `ispell-check-version'.
> 
> It is not just for selecting UTF-8; indeed, that's the irony: in the
> default configuration it's used mostly to select 8-bit character sets!
> And there are one or two other typos. How about (suitably rewrapped):
> 
> Command line option prefix to select coding system if supported, nil otherwise.
> If the coding system is selectable from the command line
> this variable will contain \"--encoding=\" for aspell and \"-i \" for hunspell,
> so that the input encoding can be selected.  That will be set for hunspell
> >= 1.1.6 or aspell >= 0.60 in `ispell-check-version'.

Agreed, thanks

> Then, the following code in ispell-start-process:
> 
>     ;; If we are using recent aspell or hunspell, make sure we use the
> right encoding
>     ;; for communication. ispell or older aspell/hunspell does not support this
>     (if ispell-encoding8-command
> 	(setq args
> 	      (append args
> 		      (list
> 		       (concat ispell-encoding8-command
> 			       (symbol-name (ispell-get-coding-system)))))))
> 
> needs fixing: rather than using ispell-get-coding-system, it should
> use a prefix of buffer-file-coding-system (without the suffix that
> specifies the line ending).

No, current code is correct. It is telling the spellchecker that
communication with the dictionary will be done in (ispell-get-coding-system) 
coding system. ispell.el will do the internal conversions needed for that in 
a diferent place, so everything is transparent to the user.

> I'm sure I'm missing things here, but if what I've said above makes
> any sense, I'd like to help refine it into a sensible proposal to
> improve ispell.el.

Thanks for looking into this. Will prepare a change with the
`ispell-encoding8-command' documentation fix.

Regards,

-- 
Agustin




This bug report was last modified 14 years and 177 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.