#38587 - base64-decode-region breaks encoding

GNU bug report logs - #38587
base64-decode-region breaks encoding

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Fri, 13 Dec 2019 00:04:01 UTC

Severity: normal

Tags: wontfix

Fixed in version 27.0.50

Done: Juri Linkov <juri <at> linkov.net>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Juri Linkov <juri <at> linkov.net> Cc: larsi <at> gnus.org, schwab <at> linux-m68k.org, 38587 <at> debbugs.gnu.org Subject: bug#38587: base64-decode-region breaks encoding Date: Mon, 16 Dec 2019 17:58:29 +0200

> From: Juri Linkov <juri <at> linkov.net> > Date: Mon, 16 Dec 2019 00:40:55 +0200 > Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 38587 <at> debbugs.gnu.org > > > BASE64 is defined on a sequence of bytes. It doesn't make sense to > > apply it to characters. > > But isn't UTF-8 a multibyte encoding represented by a sequence of bytes > (e.g. when saved to a file)? When saved to a file, yes. > Then why base64-encode-region couldn't use the buffer's coding > to convert the region to a sequence of bytes? Because it isn't guaranteed that the buffer's encoding is indeed the right one for this job. > Also why base64-encode-region accepts region's characters > only from the charsets ‘eight-bit-control’ and ‘eight-bit-graphic’, > but not other UTF-8 characters? Because it wants raw bytes, and only eight-bit charsets fit that condition. Eight-bit charset is the charset of raw bytes in a multibyte buffer or string. (base64-encode-region can also work on unibyte buffers and strings, in which case "charset" of such "text" has no meaning.) > > The input of base64-encode-region needs to be encoded into bytes and the > > output of base64-decode-region needs to be decoded into characters. If > > you do that, you get a full reversible operation. > > I guess base64-encode-region already encodes the region into bytes, > but only partially - it signals an error on some characters, > I don't understand why it can't encode all of them. Once again, because it wants to process only raw bytes. > But is it still possible to tell base64-decode-region > about the expected output coding system? Maybe using > a prefix arg: C-u M-x base64-decode-region could ask > for a coding, defaulting to the buffer's coding. If we want to make such a change, then "C-x RET c" is a better prefix command, as it is consistent with other commands that accept coding-system overrides. > Is there an equivalent of force_encoding('UTF-8') in Emacs? "C-x RET c utf-8 RET M-x SOME-COMMAND RET" > Also this doesn't work on the string output: > > (decode-coding-string (base64-decode-string (base64-encode-string "ä")) > 'utf-8) It will work if you encode "ä" first: (decode-coding-string (base64-decode-string (base64-encode-string (encode-coding-string "ä" 'utf-8))) 'utf-8)

This bug report was last modified 5 years and 148 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #38587 base64-decode-region breaks encoding

GNU bug report logs - #38587
base64-decode-region breaks encoding