#46933 - Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos

GNU bug report logs - #46933
Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos

Package: emacs;

Reported by: Gregory Heytings <gregory <at> heytings.org>

Date: Thu, 4 Mar 2021 21:22:02 UTC

Severity: normal

Message #8 received at 46933 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org> To: Gregory Heytings <gregory <at> heytings.org>, Kenichi Handa <handa <at> gnu.org> Cc: 46933 <at> debbugs.gnu.org Subject: Re: bug#46933: Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos Date: Sun, 21 Mar 2021 17:27:45 +0200

> Date: Thu, 04 Mar 2021 21:21:24 +0000 > From: Gregory Heytings <gregory <at> heytings.org> > > (Disclaimer: I have no knowledge whatsoever about the ISO-2022-JP > encoding, and although this looks like a bug, I'm not sure this is > actually a bug; I report this at the suggesion of Eli in bug#46859.) > > I downloaded the file [1], and converted it to the ISO-2022-JP encoding > with iconv -t iso-2022-jp one.txt > iso-2022-jp.txt. The resulting file > is attached to this bug report. It ends with two CRLFs, at byte offsets > 2993 and 2995. However, after emacs -Q iso-2022-jp.txt, with M-: > (goto-char (filepos-to-bufferpos POS 'exact)) we get: > > POS = 2991, 2992: last but one visible character (HIRAGANA LETTER RU) > POS = 2993, 2994: last visible character (IDEOGRAPHIC FULL STOP) > POS = 2995, 2996: first CRLF > POS = 2997: second CRLF > POS = 2998: point-max > POS = 2999: first CRLF > POS = 3000, 3001: second CRLF > POS >= 3002: point-max > > I would have expected: > > POS = 2989, 2990: last but one visible character (HIRAGANA LETTER RU) > POS = 2991, 2992: last visible character (IDEOGRAPHIC FULL STOP) > POS = 2993, 2994: first CRLF > POS = 2995, 2996: second CRLF > POS >= 2997: point-max > > The opposite operation M-: (bufferpos-to-filepos (- (point) POS) 'exact) > apparently also has bugs; its return values are not coherent with the > above ones: > > POS = 0: 3003 > POS = 1: 3001 > POS = 2: 2999 > POS = 3 (IDEOGRAPHIC FULL STOP): 2997 > POS = 4 (HIRAGANA LETTER RU): 2995 > > I would have expected: > > POS = 0: 2997 > POS = 1: 2995 > POS = 2: 2993 > POS = 3 (IDEOGRAPHIC FULL STOP): 2991 > POS = 4 (HIRAGANA LETTER RU): 2989 > > [1] https://darza.com/ecbackend/vendor/symfony/mime/Tests/Fixtures/samples/charsets/iso-2022-jp/one.txt There's something strange going on here with encoding of the buffer using iso-2022-jp-dos: near the end of the encoded bytestream, between the encoded HIRAGANA LETTER KO (こ) and HIRAGANA LETTER TO (と), we get 6 extra bytes: "ESC ( B ESC $ B". AFAIU, this sequence mean switch to ASCII and then switch back to Japanese. So together these 6 bytes are a no-op as regards to their effect on the text, but they disrupt the logic of filepos-to-bufferpos because they introduce extra bytes that aren't there in the original file. Kenichi, why are these 6 bytes inserted by encode-coding-region, but not when we encode the same text as part of saving the buffer to its file? And why does it happen near the end of the text, between those 2 particular letters?

This bug report was last modified 3 years and 53 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #46933 Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos

GNU bug report logs - #46933
Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos