GNU bug report logs - #37633
Column part interpreted wrong in compilation mode

Previous Next

Package: emacs;

Reported by: Bernd Paysan <bernd <at> net2o.de>

Date: Sat, 5 Oct 2019 15:45:01 UTC

Severity: normal

Tags: wontfix

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #50 received at 37633 <at> debbugs.gnu.org (full text, mbox):

From: Bernd Paysan <bernd <at> net2o.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: anton <at> mips.complang.tuwien.ac.at, 37633 <at> debbugs.gnu.org
Subject: Re: bug#37633: Column part interpreted wrong in compilation mode
Date: Sun, 06 Oct 2019 21:02:14 +0200
[Message part 1 (text/plain, inline)]
Am Sonntag, 6. Oktober 2019, 19:53:49 CEST schrieb Eli Zaretskii:
> > Date: Sun, 6 Oct 2019 14:31:12 +0200
> > From: Anton Ertl <anton <at> mips.complang.tuwien.ac.at>
> > Cc: bernd <at> net2o.de, 37633 <at> debbugs.gnu.org,
> > anton <at> mips.complang.tuwien.ac.at
> > 
> > On Sat, Oct 05, 2019 at 07:16:53PM +0300, Eli Zaretskii wrote:
> > > For byte offsets in external text we have bufferpos-to-filepos, but
> > > that requires us to know the encoding of the external text.  We need
> > > to find a reasonable way of getting that.  Suggestions and patches
> > > welcome.
> > 
> > It's the encoding that you assumed for the text when you loaded the
> > file into the buffer.
> 
> I'm not sure this is correct.  You are saying that the compiler counts
> bytes in the original file, not in its output (which might be encoded
> differently).  Do we have conclusive evidence that this is always
> true?

Almost always.  gcc has a gazillion of options almost nobody uses.

E.g., you can use -finput-encoding=<endoding> to transcode input files on 
reading.  It's a not well tested option, as the output (still iso8859-1) 
shows:

% gcc -finput-charset=iso8859-1 test-iso.c
test-iso.c: In function ‘foo’:
test-iso.c:2:2: warning: implicit declaration of function ‘printf’ [-
Wimplicit-function-declaration]
    2 |  printf("test %i", b);
      |  ^~~~~~
test-iso.c:2:2: warning: incompatible implicit declaration of built-in 
function ‘printf’
test-iso.c:1:1: note: include ‘<stdio.h>’ or provide a declaration of ‘printf’
  +++ |+#include <stdio.h>
    1 | void foo() {
test-iso.c:2:20: error: ‘b’ undeclared (first use in this function)
    2 |  printf("test %i", b);
      |                    ^
test-iso.c:2:20: note: each undeclared identifier is reported only once for 
each function it appears in
test-iso.c:3:26: error: ‘c’ undeclared (first use in this function)
    3 |  printf("test��� %i", c);
      |                          ^

Here, due to the conversion on read in, the position reported is different (it 
was 3:23 before).

This transparent conversion on reading is used rarely.  Or rather: There is no 
search result in the entire github database.

> > the byte position does not depend on the encoding (unlike the
> > character position).
> 
> ??? The same Latin-1 characters encoded in ISO-8859-1 and in UTF-8
> will yield a different number of bytes.  So I don't think I understand
> how can you say the above.

What I'm trying to tell: The compiler (unless instructed to convert the file 
on reading) reports the byte position it found in the file.  That's the same 
byte position the editor calculates for that file — and that is regardless of 
what the editor assumed as encoding.  I.e. if the editor mistook a UTF-8 file 
for an iso8859-1, it will see an UTF-8 string "äöü" (6 bytes UTF-8) as 
"äöü" (6 bytes iso8859-1).  But it's still 6 bytes.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself"
net2o id: kQusJzA;7*?t=uy <at> X}1GWr!+0qqp_Cn176t4(dQ*
https://net2o.de/
[signature.asc (application/pgp-signature, inline)]

This bug report was last modified 3 years and 86 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.