GNU bug report logs -
#44411
28.0.50; uudecode-decode-region-internal is broken
Previous Next
Reported by: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
Date: Tue, 3 Nov 2020 08:28:02 UTC
Severity: normal
Tags: patch
Found in version 28.0.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 44411 in the body.
You can then email your comments to 44411 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44411
; Package
emacs
.
(Tue, 03 Nov 2020 08:28:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Tue, 03 Nov 2020 08:28:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
When I call uudecode-decode-region-internal in multibyte buffer, it
fails to decode eight-bit characters.
The function makes string from uuencoded text by passing unsigned char
vlue (0-255) to char-to-string function, which makes multibyte-string.
After that, string is decoded as binary. But eight-bit characters are
never made in that way.
(let ((ch #xc8))
(decode-coding-string (char-to-string ch) 'binary))
-> "8"
Additionally, concat and char-to-string functions are called so
frequently that deocder is very slow for large data.
Please see the below patch.
diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..f9254aee75 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -149,12 +149,10 @@ uudecode-decode-region-internal
(setq counter (1+ counter)
inputpos (1+ inputpos))
(cond ((= counter 4)
- (setq result (cons
- (concat
- (char-to-string (ash bits -16))
- (char-to-string (logand (ash bits -8) 255))
- (char-to-string (logand bits 255)))
- result))
+ (setq result (cons (logand bits 255)
+ (cons (logand (ash bits -8) 255)
+ (cons (ash bits -16)
+ result))))
(setq bits 0 counter 0))
(t (setq bits (ash bits 6)))))))
(cond
@@ -166,26 +164,21 @@ uudecode-decode-region-internal
;;(error "uucode ends unexpectedly")
(setq done t))
((= counter 3)
- (setq result (cons
- (concat
- (char-to-string (logand (ash bits -16) 255))
- (char-to-string (logand (ash bits -8) 255)))
- result)))
+ (setq result (cons (logand (ash bits -8) 255)
+ (cons (logand (ash bits -16) 255)
+ result))))
((= counter 2)
- (setq result (cons
- (char-to-string (logand (ash bits -10) 255))
- result))))
+ (setq result (cons (logand (ash bits -10) 255)
+ result))))
(skip-chars-forward non-data-chars end))
+ (setq result (apply #'unibyte-string (nreverse result)))
(if file-name
(with-temp-file file-name
(set-buffer-multibyte nil)
- (insert (apply #'concat (nreverse result))))
+ (insert result))
(or (markerp end) (setq end (set-marker (make-marker) end)))
(goto-char start)
- (if enable-multibyte-characters
- (dolist (x (nreverse result))
- (insert (decode-coding-string x 'binary)))
- (insert (apply #'concat (nreverse result))))
+ (insert result)
(delete-region (point) end))))))
;;;###autoload
--
Kazuhiro Ito
Added tag(s) patch.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Tue, 03 Nov 2020 15:08:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44411
; Package
emacs
.
(Tue, 03 Nov 2020 15:10:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 44411 <at> debbugs.gnu.org (full text, mbox):
Kazuhiro Ito <kzhr <at> d1.dion.ne.jp> writes:
> The function makes string from uuencoded text by passing unsigned char
> vlue (0-255) to char-to-string function, which makes multibyte-string.
> After that, string is decoded as binary. But eight-bit characters are
> never made in that way.
>
> (let ((ch #xc8))
> (decode-coding-string (char-to-string ch) 'binary))
>
> -> "8"
>
> Additionally, concat and char-to-string functions are called so
> frequently that deocder is very slow for large data.
>
> Please see the below patch.
Thanks; looks good to me. This patch is slightly too large to apply
without having an FSF copyright on file, and I don't see that in the
assignment file for you.
Would you be willing to sign such paperwork so that we can get the patch
applied?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44411
; Package
emacs
.
(Tue, 03 Nov 2020 15:37:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 44411 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 03 Nov 2020 17:27:41 +0900
> From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
>
> When I call uudecode-decode-region-internal in multibyte buffer, it
> fails to decode eight-bit characters.
What do you mean by "decode eight-bit characters"? There's no such
thing in the "real world", it is entirely an Emacs invention. How
could such "characters" appear in uuencoded email message?
Can you describe the real-life use case where this happened?
> Additionally, concat and char-to-string functions are called so
> frequently that deocder is very slow for large data.
Please show only the changes to fix what you think is a bug. let's
leave the optimization alone for a moment, so that it doesn't muddy
the waters.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44411
; Package
emacs
.
(Wed, 04 Nov 2020 08:28:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 44411 <at> debbugs.gnu.org (full text, mbox):
> > When I call uudecode-decode-region-internal in multibyte buffer, it
> > fails to decode eight-bit characters.
>
> What do you mean by "decode eight-bit characters"?
I mean "decode uuencoded raw bytes 128-255 as eight-bit characters".
> How could such "characters" appear in uuencoded email message?
I did not said anything about uuencoded email message. What do you
mean?
> Can you describe the real-life use case where this happened?
1. Yank below uuencoded string into multibyte buffer.
begin 644 c8c8c8c8.bin
$R,C(R```@
``
end
size 4
2. C-SPC at the beginning of uuencoded text and move point to the end
of uuencoded text.
3. M-x uudecode-decode-region-internal
4. decoded result is broken. Original data is 4bytes of 0xc8, but
inserted text is "8888". uudecode-decode-region-external returns
expected result.
> Please show only the changes to fix what you think is a bug.
Here is.
diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..81a0a8ae0d 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -151,9 +151,9 @@ uudecode-decode-region-internal
(cond ((= counter 4)
(setq result (cons
(concat
- (char-to-string (ash bits -16))
- (char-to-string (logand (ash bits -8) 255))
- (char-to-string (logand bits 255)))
+ (unibyte-string (ash bits -16))
+ (unibyte-string (logand (ash bits -8) 255))
+ (unibyte-string (logand bits 255)))
result))
(setq bits 0 counter 0))
(t (setq bits (ash bits 6)))))))
@@ -168,12 +168,12 @@ uudecode-decode-region-internal
((= counter 3)
(setq result (cons
(concat
- (char-to-string (logand (ash bits -16) 255))
- (char-to-string (logand (ash bits -8) 255)))
+ (unibyte-string (logand (ash bits -16) 255))
+ (unibyte-string (logand (ash bits -8) 255)))
result)))
((= counter 2)
(setq result (cons
- (char-to-string (logand (ash bits -10) 255))
+ (unibyte-string (logand (ash bits -10) 255))
result))))
(skip-chars-forward non-data-chars end))
(if file-name
@@ -184,7 +184,7 @@ uudecode-decode-region-internal
(goto-char start)
(if enable-multibyte-characters
(dolist (x (nreverse result))
- (insert (decode-coding-string x 'binary)))
+ (insert x))
(insert (apply #'concat (nreverse result))))
(delete-region (point) end))))))
--
Kazuhiro Ito
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44411
; Package
emacs
.
(Wed, 04 Nov 2020 15:28:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 44411 <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 04 Nov 2020 17:26:56 +0900
> From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> Cc: 44411 <at> debbugs.gnu.org
>
> > What do you mean by "decode eight-bit characters"?
>
> I mean "decode uuencoded raw bytes 128-255 as eight-bit characters".
>
> > How could such "characters" appear in uuencoded email message?
>
> I did not said anything about uuencoded email message. What do you
> mean?
Sorry, it looks like I misunderstood the use case. Ignore that
question.
> 1. Yank below uuencoded string into multibyte buffer.
>
> begin 644 c8c8c8c8.bin
> $R,C(R```@
> ``
> end
> size 4
>
> 2. C-SPC at the beginning of uuencoded text and move point to the end
> of uuencoded text.
>
> 3. M-x uudecode-decode-region-internal
>
> 4. decoded result is broken. Original data is 4bytes of 0xc8, but
> inserted text is "8888". uudecode-decode-region-external returns
> expected result.
OK, I see the problem now: it's the call to decode-coding-string,
which replaced string-as-unibyte of yore.
> > Please show only the changes to fix what you think is a bug.
>
> Here is.
OK, I agree also to your other optimizations, but can we please go a
step further and avoid consing a string here? 'insert' is perfectly
capable of inserting characters, not only strings. So in the
unibyte-buffer case, just something like
(apply #'insert (nreverse result))
with 'result' being a list of bytes, will produce what you want. And
in the multibyte-buffer case you just need to convert each byte to its
Unicode-compatible codepoint, by using
(decode-char 'eight-bit CH)
probably in 'mapcar' or somesuch, and then call 'insert'.
Can you augment your patch along these lines, please?
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44411
; Package
emacs
.
(Thu, 05 Nov 2020 10:49:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 44411 <at> debbugs.gnu.org (full text, mbox):
> OK, I agree also to your other optimizations, but can we please go a
> step further and avoid consing a string here? 'insert' is perfectly
> capable of inserting characters, not only strings. So in the
> unibyte-buffer case, just something like
>
> (apply #'insert (nreverse result))
>
> with 'result' being a list of bytes, will produce what you want. And
> in the multibyte-buffer case you just need to convert each byte to its
> Unicode-compatible codepoint, by using
>
> (decode-char 'eight-bit CH)
>
> probably in 'mapcar' or somesuch, and then call 'insert'.
>
> Can you augment your patch along these lines, please?
Here is a revised one.
diff --git a/lisp/mail/uudecode.el b/lisp/mail/uudecode.el
index bcbd571b53..0dce9b7b72 100644
--- a/lisp/mail/uudecode.el
+++ b/lisp/mail/uudecode.el
@@ -149,12 +149,10 @@ uudecode-decode-region-internal
(setq counter (1+ counter)
inputpos (1+ inputpos))
(cond ((= counter 4)
- (setq result (cons
- (concat
- (char-to-string (ash bits -16))
- (char-to-string (logand (ash bits -8) 255))
- (char-to-string (logand bits 255)))
- result))
+ (setq result (cons (logand bits 255)
+ (cons (logand (ash bits -8) 255)
+ (cons (ash bits -16)
+ result))))
(setq bits 0 counter 0))
(t (setq bits (ash bits 6)))))))
(cond
@@ -166,26 +164,26 @@ uudecode-decode-region-internal
;;(error "uucode ends unexpectedly")
(setq done t))
((= counter 3)
- (setq result (cons
- (concat
- (char-to-string (logand (ash bits -16) 255))
- (char-to-string (logand (ash bits -8) 255)))
- result)))
+ (setq result (cons (logand (ash bits -8) 255)
+ (cons (logand (ash bits -16) 255)
+ result))))
((= counter 2)
- (setq result (cons
- (char-to-string (logand (ash bits -10) 255))
- result))))
+ (setq result (cons (logand (ash bits -10) 255)
+ result))))
(skip-chars-forward non-data-chars end))
(if file-name
(with-temp-file file-name
(set-buffer-multibyte nil)
- (insert (apply #'concat (nreverse result))))
+ (apply #'insert (nreverse result)))
(or (markerp end) (setq end (set-marker (make-marker) end)))
(goto-char start)
- (if enable-multibyte-characters
- (dolist (x (nreverse result))
- (insert (decode-coding-string x 'binary)))
- (insert (apply #'concat (nreverse result))))
+ (apply #'insert
+ (nreverse
+ (if enable-multibyte-characters
+ (mapcar (lambda (ch)
+ (or (decode-char 'eight-bit ch) ch))
+ result)
+ result)))
(delete-region (point) end))))))
;;;###autoload
--
Kazuhiro Ito
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44411
; Package
emacs
.
(Thu, 05 Nov 2020 13:49:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 44411 <at> debbugs.gnu.org (full text, mbox):
> Date: Thu, 05 Nov 2020 19:48:08 +0900
> From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> Cc: 44411 <at> debbugs.gnu.org
>
> Here is a revised one.
Thanks, this LGTM. I will wait for a couple of days, and push then in
your name if no new comments are voiced.
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Sat, 07 Nov 2020 09:44:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
:
bug acknowledged by developer.
(Sat, 07 Nov 2020 09:44:02 GMT)
Full text and
rfc822 format available.
Message #30 received at 44411-done <at> debbugs.gnu.org (full text, mbox):
> Date: Thu, 05 Nov 2020 19:48:08 +0900
> From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> Cc: 44411 <at> debbugs.gnu.org
>
> > Can you augment your patch along these lines, please?
>
> Here is a revised one.
Thanks, pushed to the emacs-27 branch.
Since with this patch you have exhausted the amount of changes we can
accept from you without copyright assignment, would you like to start
the assignment process at this time?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44411
; Package
emacs
.
(Mon, 09 Nov 2020 15:48:02 GMT)
Full text and
rfc822 format available.
Message #33 received at 44411 <at> debbugs.gnu.org (full text, mbox):
> > > Can you augment your patch along these lines, please?
> >
> > Here is a revised one.
>
> Thanks, pushed to the emacs-27 branch.
Thank you.
> Since with this patch you have exhausted the amount of changes we can
> accept from you without copyright assignment, would you like to start
> the assignment process at this time?
I've stared the process and requested the assignment form.
--
Kazuhiro Ito
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 08 Dec 2020 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 190 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.