GNU bug report logs - #33044
Guile misbehaves in the "ja_JP.sjis" locale

Previous Next

Package: guile;

Reported by: Tom de Vries <tdevries <at> suse.de>

Date: Mon, 15 Oct 2018 10:43:01 UTC

Severity: normal

Full log


Message #19 received at 33044 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Tom de Vries <tdevries <at> suse.de>
Cc: 33044 <at> debbugs.gnu.org
Subject: Re: bug#33044: Guile misbehaves in the "ja_JP.sjis" locale
Date: Tue, 16 Oct 2018 01:13:43 -0400
Mark H Weaver <mhw <at> netris.org> writes:

> Shift_JIS is _mostly_ ASCII-compatible, except that code points 0x5C and
> 0x7E, which represent backslash (\) and tilde (~) in ASCII, are mapped
> to the Yen sign (¥) and overline (‾) in Shift_JIS.  Backslash (\) and
> tilde (~) are multibyte characters in Shift_JIS.

Although I wrote above that "Backslash (\) and tilde (~) are multibyte
characters in Shift_JIS", that was admittedly my assumption, based on
the absence of those characters in the "First byte" map shown here:

  https://en.wikipedia.org/wiki/Shift_JIS#As_defined_in_JIS_X_0208:1997

However, now I'm unsure.  I've spent some time attempting to find the
Shift_JIS encodings for backslash and tilde, but I've not yet found an
answer.

I've asked Emacs 26 to write a file containing backslashes and Yen signs
using the "shift_jis" encoding, and both characters seem to be mapped to
the same code: 0x5C.

I've also used the 'iconv' utility from GNU libc to convert backslashes
and Yen signs to Shift_JIS, and it also maps these two characters to the
same codes:

--8<---------------cut here---------------start------------->8---
mhw <at> jojen ~$ echo '\\¥¥' | iconv -f UTF-8 -t SHIFT-JIS > Shift_JIS_test.txt
mhw <at> jojen ~$ hexdump -C Shift_JIS_test.txt
00000000  5c 5c 5c 5c 0a                                    |\\\\.|
00000005
--8<---------------cut here---------------end--------------->8---

While investigating, I found this bug for GNU libc asking to add an SJIS
locale, and the developers were strongly opposed:

  https://bugzilla.redhat.com/show_bug.cgi?id=136290

At this point, I'm inclined to believe that Shift_JIS is not suitable as
a locale encoding on POSIX systems, and that we should not try to
support it in Guile.

What do you think?

Can you tell me how backslash and tilde are represented in Shift JIS?

     Regards,
       Mark




This bug report was last modified 6 years and 238 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.