GNU bug report logs - #46933
Possible bugs in filepos-to-bufferpos / bufferpos-to-filepos

Previous Next

Package: emacs;

Reported by: Gregory Heytings <gregory <at> heytings.org>

Date: Thu, 4 Mar 2021 21:22:02 UTC

Severity: normal

Full log


Message #26 received at 46933 <at> debbugs.gnu.org (full text, mbox):

From: handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gregory <at> heytings.org, 46933 <at> debbugs.gnu.org
Subject: Re: bug#46933: Possible bugs in filepos-to-bufferpos /
 bufferpos-to-filepos
Date: Sun, 28 Mar 2021 23:29:41 +0900
In article <83pmzkog6x.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:

> > How about something like this method:
> > 1. Encode the buffer text one line by one until we get a longer byte
> > sequence than BYTE.
> > 2. Delete the result of enoding the last line above.
> > 3. Provided that the above last line has chars C1 C2 ... Cn, 
> > encode characters C1...Cn, C1...Cn-1, C1...Cn-2 until we get a shorter
> > byte sequence than BYTE.
> > 
> > The first step may be optimized by encode multiple lines instead of
> > single line.

> Even if we do optimize, this would be very slow, I think.

Whether it is too slow or not depends on what filepos-to-bufferpos is
used for.  Do you know why filepos-to-bufferpos (and
bufferpos-to-filepos) is introduced?

> And what if the buffer has no newlines?

In that case, just do the step 2.  Or, we can use the bi-sectioning
technique.

> In any case, the problem is not with encoding, the problem is with
> decoding.  Encoding doesn't have this problem because we always encode
> more than enough (we use the value of BYTE as the count of
> _characters_ to encode, so for ISO-2022 encoding it is usually much
> more than needed).  By contrast, when decoding, we decode exactly
> BYTE+1 bytes, which then hits the problem if that offset is inside a
> shift sequence.

Then, that implementation should be changed.

Any coding system can have :post-read-conversion and
:pre-write-conversion functions, it is not guaranteed that encoded byte
length is greater than the number of characters.

---
K. Handa
handa <at> gnu.org




This bug report was last modified 3 years and 52 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.