#38587 - base64-decode-region breaks encoding

GNU bug report logs - #38587
base64-decode-region breaks encoding

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Fri, 13 Dec 2019 00:04:01 UTC

Severity: normal

Tags: wontfix

Fixed in version 27.0.50

Done: Juri Linkov <juri <at> linkov.net>

Bug is archived. No further changes may be made.

Message #23 received at 38587 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net> To: Andreas Schwab <schwab <at> linux-m68k.org> Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 38587 <at> debbugs.gnu.org Subject: Re: bug#38587: base64-decode-region breaks encoding Date: Mon, 16 Dec 2019 00:40:55 +0200

>> Maybe an additional CODING arg for base64-decode-region? > > BASE64 is defined on a sequence of bytes. It doesn't make sense to > apply it to characters. But isn't UTF-8 a multibyte encoding represented by a sequence of bytes (e.g. when saved to a file)? Then why base64-encode-region couldn't use the buffer's coding to convert the region to a sequence of bytes? Also why base64-encode-region accepts region's characters only from the charsets ‘eight-bit-control’ and ‘eight-bit-graphic’, but not other UTF-8 characters? > The input of base64-encode-region needs to be encoded into bytes and the > output of base64-decode-region needs to be decoded into characters. If > you do that, you get a full reversible operation. I guess base64-encode-region already encodes the region into bytes, but only partially - it signals an error on some characters, I don't understand why it can't encode all of them. >> Or it would be enough to use the coding system of the >> output buffer? > > The coding system of the output buffer has nothing to do with the coding > of the data produced by base64-decode-region, just like > process-coding-system is independent from the coding system of the > process buffer. It's understandable that the coding system of the output buffer is not necessarily the same as expected from the output of base64-decode-region. But is it still possible to tell base64-decode-region about the expected output coding system? Maybe using a prefix arg: C-u M-x base64-decode-region could ask for a coding, defaulting to the buffer's coding. For example, in Ruby require 'base64' Base64.decode64(Base64.encode64("☃")) => "\xE2\x98\x83" indeed outputs ASCII not encoded to UTF-8. But it's possible to force encoding with: Base64.decode64(Base64.encode64("☃")).force_encoding('UTF-8') => "☃" Is there an equivalent of force_encoding('UTF-8') in Emacs? I tried to call after base64-decode-region on its output: (decode-coding-region (point-min) (point-max) 'binary) but it doesn't work, neither this: (encode-coding-region (point-min) (point-max) 'utf-8) Also this doesn't work on the string output: (decode-coding-string (base64-decode-string (base64-encode-string "ä")) 'utf-8) Maybe I'm doing something wrong?

This bug report was last modified 5 years and 181 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #38587 base64-decode-region breaks encoding

GNU bug report logs - #38587
base64-decode-region breaks encoding