GNU bug report logs - #29189
25.3; Dired does not work with binary filenames

Previous Next

Package: emacs;

Reported by: Allen Li <vianchielfaura <at> gmail.com>

Date: Tue, 7 Nov 2017 09:04:01 UTC

Severity: minor

Found in version 25.3

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29189 in the body.
You can then email your comments to 29189 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Tue, 07 Nov 2017 09:04:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Allen Li <vianchielfaura <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 07 Nov 2017 09:04:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Allen Li <vianchielfaura <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 25.3; Dired does not work with binary filenames
Date: Tue, 7 Nov 2017 01:03:24 -0800
Dired does not work with binary filenames

For example, create such a file with Bash:

touch $'\265'

1. Navigate to the directory containing said file with Dired
2. Mark file for deletion with d
3. x

Expected:

File deleted

Actual:

(file-error Removing old name No such file or directory /home/bob/tmp/\300\265)

In GNU Emacs 25.3.1 (x86_64-pc-linux-gnu, GTK+ Version 3.22.19)
 of 2017-09-16 built on juergen
Windowing system distributor 'The X.Org Foundation', version 11.0.11905000
Configured using:
 'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
 --localstatedir=/var --with-x-toolkit=gtk3 --with-xft --with-modules
 'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong
 -fno-plt' CPPFLAGS=-D_FORTIFY_SOURCE=2
 LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GCONF GSETTINGS
NOTIFY ACL GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Tue, 07 Nov 2017 10:01:02 GMT) Full text and rfc822 format available.

Message #8 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> suse.de>
To: Allen Li <vianchielfaura <at> gmail.com>
Cc: 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Tue, 07 Nov 2017 11:00:13 +0100
On Nov 07 2017, Allen Li <vianchielfaura <at> gmail.com> wrote:

> Dired does not work with binary filenames
>
> For example, create such a file with Bash:
>
> touch $'\265'
>
> 1. Navigate to the directory containing said file with Dired
> 2. Mark file for deletion with d
> 3. x
>
> Expected:
>
> File deleted

It works if you add b to dired-listing-switches, though the buffer isn't
properly updated afterwards.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab <at> suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Tue, 07 Nov 2017 17:09:01 GMT) Full text and rfc822 format available.

Message #11 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> suse.de>
Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Tue, 07 Nov 2017 19:08:42 +0200
> From: Andreas Schwab <schwab <at> suse.de>
> Date: Tue, 07 Nov 2017 11:00:13 +0100
> Cc: 29189 <at> debbugs.gnu.org
> 
> > touch $'\265'
> >
> > 1. Navigate to the directory containing said file with Dired
> > 2. Mark file for deletion with d
> > 3. x
> >
> > Expected:
> >
> > File deleted
> 
> It works if you add b to dired-listing-switches, though the buffer isn't
> properly updated afterwards.

Here (with the latest emacs-26 branch) it doesn't produce any error
messages, and the buffer is updated to remove the name of that file,
but 'g' brings it back, so it wasn't really deleted.

I stepped into Fdelete_file, and saw that the file name is correctly
encoded before calling 'unlink'.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Wed, 08 Nov 2017 05:13:02 GMT) Full text and rfc822 format available.

Message #14 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Allen Li <vianchielfaura <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Andreas Schwab <schwab <at> suse.de>, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Tue, 7 Nov 2017 21:12:26 -0800
On Tue, Nov 7, 2017 at 9:08 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> From: Andreas Schwab <schwab <at> suse.de>
>> It works if you add b to dired-listing-switches, though the buffer isn't
>> properly updated afterwards.
>
> Here (with the latest emacs-26 branch) it doesn't produce any error
> messages, and the buffer is updated to remove the name of that file,
> but 'g' brings it back, so it wasn't really deleted.
>
> I stepped into Fdelete_file, and saw that the file name is correctly
> encoded before calling 'unlink'.

I can confirm the behavior Andreas describes on Emacs 25.  There is a
slight bug where the buffer is not updated after the delete, but the
delete itself works

I do not know about Emacs 26, but it sounds like there's a regression.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Wed, 08 Nov 2017 06:23:02 GMT) Full text and rfc822 format available.

Message #17 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Allen Li <vianchielfaura <at> gmail.com>
To: Andreas Schwab <schwab <at> suse.de>
Cc: 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Tue, 7 Nov 2017 22:22:01 -0800
On Tue, Nov 7, 2017 at 2:00 AM, Andreas Schwab <schwab <at> suse.de> wrote:
> On Nov 07 2017, Allen Li <vianchielfaura <at> gmail.com> wrote:
>
>> Dired does not work with binary filenames
>>
>> For example, create such a file with Bash:
>>
>> touch $'\265'
>>
>> 1. Navigate to the directory containing said file with Dired
>> 2. Mark file for deletion with d
>> 3. x
>>
>> Expected:
>>
>> File deleted
>
> It works if you add b to dired-listing-switches, though the buffer isn't
> properly updated afterwards.

I just discovered that -b does not play well with wdired.  Should I
file a separate bug for that?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Wed, 08 Nov 2017 08:46:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: bug-gnu-emacs <at> gnu.org, Allen Li <vianchielfaura <at> gmail.com>,
 Andreas Schwab <schwab <at> suse.de>
Cc: 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Wed, 08 Nov 2017 10:44:47 +0200
On November 8, 2017 8:22:01 AM GMT+02:00, Allen Li <vianchielfaura <at> gmail.com> wrote:
> On Tue, Nov 7, 2017 at 2:00 AM, Andreas Schwab <schwab <at> suse.de> wrote:
> > On Nov 07 2017, Allen Li <vianchielfaura <at> gmail.com> wrote:
> >
> >> Dired does not work with binary filenames
> >>
> >> For example, create such a file with Bash:
> >>
> >> touch $'\265'
> >>
> >> 1. Navigate to the directory containing said file with Dired
> >> 2. Mark file for deletion with d
> >> 3. x
> >>
> >> Expected:
> >>
> >> File deleted
> >
> > It works if you add b to dired-listing-switches, though the buffer
> isn't
> > properly updated afterwards.
> 
> I just discovered that -b does not play well with wdired.  Should I
> file a separate bug for that?

I don't understand why -b should be used at all.  In my testing, it
wasn't needed.  Maybe this depends on the locale?  What's yours?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Wed, 08 Nov 2017 08:46:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 11 Nov 2017 07:01:02 GMT) Full text and rfc822 format available.

Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Allen Li <vianchielfaura <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Andreas Schwab <schwab <at> suse.de>, bug-gnu-emacs <at> gnu.org,
 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Fri, 10 Nov 2017 22:59:56 -0800
On Wed, Nov 8, 2017 at 12:44 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> I don't understand why -b should be used at all.  In my testing, it
> wasn't needed.  Maybe this depends on the locale?  What's yours?

en_US.UTF-8

Without -b, the filename in Dired is two binary characters, \300 and
\265.  With -b, the filename in Dired is four characters, \265

I'm using Emacs 25.3.1 and ls (GNU coreutils) 8.28

It sounds like Andreas is seeing the same behavior as me.  What
behavior are you seeing when deleting the file with -b vs no -b?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 11 Nov 2017 07:01:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 11 Nov 2017 08:21:02 GMT) Full text and rfc822 format available.

Message #32 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Allen Li <vianchielfaura <at> gmail.com>
Cc: schwab <at> suse.de, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sat, 11 Nov 2017 10:20:36 +0200
> From: Allen Li <vianchielfaura <at> gmail.com>
> Date: Fri, 10 Nov 2017 22:59:56 -0800
> Cc: bug-gnu-emacs <at> gnu.org, Andreas Schwab <schwab <at> suse.de>, 29189 <at> debbugs.gnu.org
> 
> On Wed, Nov 8, 2017 at 12:44 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> > I don't understand why -b should be used at all.  In my testing, it
> > wasn't needed.  Maybe this depends on the locale?  What's yours?
> 
> en_US.UTF-8
> 
> Without -b, the filename in Dired is two binary characters, \300 and
> \265.  With -b, the filename in Dired is four characters, \265

Sorry, it seems I was confused.  You didn't originally say what file
name you expected to see in Dired.  If the expected file name is \265,
a single byte, but you see \300\265 instead, then the problem is not
in deletion, the problem is in how Dired prepares file names for
display.  I will look into that when I have time, if no one beats me
to it.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 11 Nov 2017 14:19:02 GMT) Full text and rfc822 format available.

Message #35 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: vianchielfaura <at> gmail.com, Kenichi Handa <handa <at> gnu.org>
Cc: schwab <at> suse.de, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sat, 11 Nov 2017 16:18:20 +0200
> Date: Sat, 11 Nov 2017 10:20:36 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: schwab <at> suse.de, 29189 <at> debbugs.gnu.org
> 
> > Without -b, the filename in Dired is two binary characters, \300 and
> > \265.  With -b, the filename in Dired is four characters, \265
> 
> Sorry, it seems I was confused.  You didn't originally say what file
> name you expected to see in Dired.  If the expected file name is \265,
> a single byte, but you see \300\265 instead, then the problem is not
> in deletion, the problem is in how Dired prepares file names for
> display.  I will look into that when I have time, if no one beats me
> to it.

The problem is in insert-directory.  It manually decodes each file
name which was output by 'ls', and that produces strangely
inconsistent results when the file name includes raw bytes: sometimes
we get the 2-byte sequence starting with \300, sometimes the original
byte survives unchanged, and sometimes I see the sequence \301\200
instead of a lone \300 in the file name.  I'm trying to understand
what's going on and find a solution to that.

CC'ing Handa-san in the hope that he could comment on this or provide
some suggestions.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 11 Nov 2017 15:22:02 GMT) Full text and rfc822 format available.

Message #38 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: vianchielfaura <at> gmail.com
Cc: handa <at> gnu.org, 29189 <at> debbugs.gnu.org, schwab <at> suse.de
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sat, 11 Nov 2017 17:21:21 +0200
> Date: Sat, 11 Nov 2017 16:18:20 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: schwab <at> suse.de, 29189 <at> debbugs.gnu.org
> 
> The problem is in insert-directory.  It manually decodes each file
> name which was output by 'ls', and that produces strangely
> inconsistent results when the file name includes raw bytes: sometimes
> we get the 2-byte sequence starting with \300, sometimes the original
> byte survives unchanged, and sometimes I see the sequence \301\200
> instead of a lone \300 in the file name.  I'm trying to understand
> what's going on and find a solution to that.

Can you please try the patch below?  (You will need to re-dump Emacs
after patching files.el.)

diff --git a/lisp/files.el b/lisp/files.el
index b47411f..43198bc 100644
--- a/lisp/files.el
+++ b/lisp/files.el
@@ -6803,10 +6803,13 @@ insert-directory
 			    val (get-text-property (point) 'dired-filename))
 		      (goto-char (next-single-property-change
 				  (point) 'dired-filename nil (point-max)))
-		      ;; Force no eol conversion on a file name, so
-		      ;; that CR is preserved.
-		      (decode-coding-region pos (point)
-					    (if val coding-no-eol coding))
+                      (let ((fn (buffer-substring-no-properties pos (point))))
+                        (delete-region pos (point))
+                        (insert
+		         ;; Force no eol conversion on a file name, so
+		         ;; that CR is preserved.
+		         (decode-coding-string (string-make-unibyte fn)
+					       (if val coding-no-eol coding))))
 		      (if val
 			  (put-text-property pos (point)
 					     'dired-filename t)))))))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Thu, 16 Nov 2017 06:32:02 GMT) Full text and rfc822 format available.

Message #41 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Allen Li <vianchielfaura <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: handa <at> gnu.org, 29189 <at> debbugs.gnu.org, Andreas Schwab <schwab <at> suse.de>
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Wed, 15 Nov 2017 22:31:48 -0800
On Sat, Nov 11, 2017 at 7:21 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> Date: Sat, 11 Nov 2017 16:18:20 +0200
>> From: Eli Zaretskii <eliz <at> gnu.org>
>> Cc: schwab <at> suse.de, 29189 <at> debbugs.gnu.org
>>
>> The problem is in insert-directory.  It manually decodes each file
>> name which was output by 'ls', and that produces strangely
>> inconsistent results when the file name includes raw bytes: sometimes
>> we get the 2-byte sequence starting with \300, sometimes the original
>> byte survives unchanged, and sometimes I see the sequence \301\200
>> instead of a lone \300 in the file name.  I'm trying to understand
>> what's going on and find a solution to that.
>
> Can you please try the patch below?  (You will need to re-dump Emacs
> after patching files.el.)
>
> diff --git a/lisp/files.el b/lisp/files.el
> index b47411f..43198bc 100644
> --- a/lisp/files.el
> +++ b/lisp/files.el
> @@ -6803,10 +6803,13 @@ insert-directory
>                             val (get-text-property (point) 'dired-filename))
>                       (goto-char (next-single-property-change
>                                   (point) 'dired-filename nil (point-max)))
> -                     ;; Force no eol conversion on a file name, so
> -                     ;; that CR is preserved.
> -                     (decode-coding-region pos (point)
> -                                           (if val coding-no-eol coding))
> +                      (let ((fn (buffer-substring-no-properties pos (point))))
> +                        (delete-region pos (point))
> +                        (insert
> +                        ;; Force no eol conversion on a file name, so
> +                        ;; that CR is preserved.
> +                        (decode-coding-string (string-make-unibyte fn)
> +                                              (if val coding-no-eol coding))))
>                       (if val
>                           (put-text-property pos (point)
>                                              'dired-filename t)))))))

This patch works for me.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Thu, 16 Nov 2017 16:01:02 GMT) Full text and rfc822 format available.

Message #44 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Allen Li <vianchielfaura <at> gmail.com>
Cc: handa <at> gnu.org, 29189 <at> debbugs.gnu.org, schwab <at> suse.de
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Thu, 16 Nov 2017 18:00:55 +0200
> From: Allen Li <vianchielfaura <at> gmail.com>
> Date: Wed, 15 Nov 2017 22:31:48 -0800
> Cc: handa <at> gnu.org, Andreas Schwab <schwab <at> suse.de>, 29189 <at> debbugs.gnu.org
> 
> > diff --git a/lisp/files.el b/lisp/files.el
> > index b47411f..43198bc 100644
> > --- a/lisp/files.el
> > +++ b/lisp/files.el
> > @@ -6803,10 +6803,13 @@ insert-directory
> >                             val (get-text-property (point) 'dired-filename))
> >                       (goto-char (next-single-property-change
> >                                   (point) 'dired-filename nil (point-max)))
> > -                     ;; Force no eol conversion on a file name, so
> > -                     ;; that CR is preserved.
> > -                     (decode-coding-region pos (point)
> > -                                           (if val coding-no-eol coding))
> > +                      (let ((fn (buffer-substring-no-properties pos (point))))
> > +                        (delete-region pos (point))
> > +                        (insert
> > +                        ;; Force no eol conversion on a file name, so
> > +                        ;; that CR is preserved.
> > +                        (decode-coding-string (string-make-unibyte fn)
> > +                                              (if val coding-no-eol coding))))
> >                       (if val
> >                           (put-text-property pos (point)
> >                                              'dired-filename t)))))))
> 
> This patch works for me.

Thanks for testing.  I'm still worried that we need to force text to
be unibyte in order for the decoding to work.  So I'd like to dig into
the code to understand why, and maybe try to fix it if I find some
problems there.  If I succeed, the result will work faster, because
the above patch is less efficient that decode-coding-region.  Let me
look into this and get back to you in a few days.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 18 Nov 2017 14:43:01 GMT) Full text and rfc822 format available.

Message #47 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org, schwab <at> suse.de
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sat, 18 Nov 2017 16:42:22 +0200
> Date: Thu, 16 Nov 2017 18:00:55 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 29189 <at> debbugs.gnu.org, schwab <at> suse.de
> 
> > From: Allen Li <vianchielfaura <at> gmail.com>
> > Date: Wed, 15 Nov 2017 22:31:48 -0800
> > Cc: handa <at> gnu.org, Andreas Schwab <schwab <at> suse.de>, 29189 <at> debbugs.gnu.org
> > 
> > > diff --git a/lisp/files.el b/lisp/files.el
> > > index b47411f..43198bc 100644
> > > --- a/lisp/files.el
> > > +++ b/lisp/files.el
> > > @@ -6803,10 +6803,13 @@ insert-directory
> > >                             val (get-text-property (point) 'dired-filename))
> > >                       (goto-char (next-single-property-change
> > >                                   (point) 'dired-filename nil (point-max)))
> > > -                     ;; Force no eol conversion on a file name, so
> > > -                     ;; that CR is preserved.
> > > -                     (decode-coding-region pos (point)
> > > -                                           (if val coding-no-eol coding))
> > > +                      (let ((fn (buffer-substring-no-properties pos (point))))
> > > +                        (delete-region pos (point))
> > > +                        (insert
> > > +                        ;; Force no eol conversion on a file name, so
> > > +                        ;; that CR is preserved.
> > > +                        (decode-coding-string (string-make-unibyte fn)
> > > +                                              (if val coding-no-eol coding))))
> > >                       (if val
> > >                           (put-text-property pos (point)
> > >                                              'dired-filename t)))))))
> > 
> > This patch works for me.
> 
> Thanks for testing.  I'm still worried that we need to force text to
> be unibyte in order for the decoding to work.  So I'd like to dig into
> the code to understand why, and maybe try to fix it if I find some
> problems there.  If I succeed, the result will work faster, because
> the above patch is less efficient that decode-coding-region.  Let me
> look into this and get back to you in a few days.

I found that the alternative patch below solves the original problem
without any changes needed in files.el, and without introducing any
performance hits.  Does anyone see a problem with this proposed patch?
Kenichi?

diff --git a/src/coding.c b/src/coding.c
index d790ad0..eaad0d7 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -7423,10 +7423,21 @@ decode_coding (struct coding_system *coding)
 
 	  while (nbytes-- > 0)
 	    {
-	      int c = *src++;
+	      int c;
 
-	      if (c & 0x80)
-		c = BYTE8_TO_CHAR (c);
+	      /* Copy raw bytes in their 2-byte forms as single characters.  */
+	      if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
+		{
+		  c = STRING_CHAR_ADVANCE (src);
+		  nbytes--;
+		}
+	      else
+		{
+		  c = *src++;
+
+		  if (c & 0x80)
+		    c = BYTE8_TO_CHAR (c);
+		}
 	      coding->charbuf[coding->charbuf_used++] = c;
 	    }
 	  produce_chars (coding, Qnil, 1);




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Mon, 20 Nov 2017 09:49:01 GMT) Full text and rfc822 format available.

Message #50 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> suse.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> gnu.org>, 29189 <at> debbugs.gnu.org,
 vianchielfaura <at> gmail.com
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Mon, 20 Nov 2017 10:48:09 +0100
On Nov 18 2017, Eli Zaretskii <eliz <at> gnu.org> wrote:

> I found that the alternative patch below solves the original problem
> without any changes needed in files.el, and without introducing any
> performance hits.  Does anyone see a problem with this proposed patch?
> Kenichi?
>
> diff --git a/src/coding.c b/src/coding.c
> index d790ad0..eaad0d7 100644
> --- a/src/coding.c
> +++ b/src/coding.c
> @@ -7423,10 +7423,21 @@ decode_coding (struct coding_system *coding)
>  
>  	  while (nbytes-- > 0)
>  	    {
> -	      int c = *src++;
> +	      int c;
>  
> -	      if (c & 0x80)
> -		c = BYTE8_TO_CHAR (c);
> +	      /* Copy raw bytes in their 2-byte forms as single characters.  */
> +	      if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> +		{
> +		  c = STRING_CHAR_ADVANCE (src);

CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
strings.  I don't think it makes sense to use them for unibyte strings.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab <at> suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Mon, 20 Nov 2017 18:17:02 GMT) Full text and rfc822 format available.

Message #53 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> suse.de>, handa <at> gnu.org
Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Mon, 20 Nov 2017 20:15:55 +0200
> From: Andreas Schwab <schwab <at> suse.de>
> Cc: Kenichi Handa <handa <at> gnu.org>,  vianchielfaura <at> gmail.com,  29189 <at> debbugs.gnu.org
> Date: Mon, 20 Nov 2017 10:48:09 +0100
> 
> > +	      /* Copy raw bytes in their 2-byte forms as single characters.  */
> > +	      if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > +		{
> > +		  c = STRING_CHAR_ADVANCE (src);
> 
> CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> strings.  I don't think it makes sense to use them for unibyte strings.

Right you are, thanks.  Updated patch below.

diff --git a/src/coding.c b/src/coding.c
index d790ad0..ac55f87 100644
--- a/src/coding.c
+++ b/src/coding.c
@@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
 
 	  while (nbytes-- > 0)
 	    {
-	      int c = *src++;
+	      int c;
 
-	      if (c & 0x80)
-		c = BYTE8_TO_CHAR (c);
+	      /* Copy raw bytes in their 2-byte forms from multibyte
+		 text as single characters.  */
+	      if (coding->src_multibyte
+		  && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
+		{
+		  c = STRING_CHAR_ADVANCE (src);
+		  nbytes--;
+		}
+	      else
+		{
+		  c = *src++;
+
+		  if (c & 0x80)
+		    c = BYTE8_TO_CHAR (c);
+		}
 	      coding->charbuf[coding->charbuf_used++] = c;
 	    }
 	  produce_chars (coding, Qnil, 1);




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Fri, 24 Nov 2017 08:53:02 GMT) Full text and rfc822 format available.

Message #56 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Fri, 24 Nov 2017 10:52:19 +0200
Ping!  Kenichi, any comments on this issue or the proposed patch?

> Date: Mon, 20 Nov 2017 20:15:55 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> 
> > From: Andreas Schwab <schwab <at> suse.de>
> > Cc: Kenichi Handa <handa <at> gnu.org>,  vianchielfaura <at> gmail.com,  29189 <at> debbugs.gnu.org
> > Date: Mon, 20 Nov 2017 10:48:09 +0100
> > 
> > > +	      /* Copy raw bytes in their 2-byte forms as single characters.  */
> > > +	      if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > +		{
> > > +		  c = STRING_CHAR_ADVANCE (src);
> > 
> > CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> > strings.  I don't think it makes sense to use them for unibyte strings.
> 
> Right you are, thanks.  Updated patch below.
> 
> diff --git a/src/coding.c b/src/coding.c
> index d790ad0..ac55f87 100644
> --- a/src/coding.c
> +++ b/src/coding.c
> @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
>  
>  	  while (nbytes-- > 0)
>  	    {
> -	      int c = *src++;
> +	      int c;
>  
> -	      if (c & 0x80)
> -		c = BYTE8_TO_CHAR (c);
> +	      /* Copy raw bytes in their 2-byte forms from multibyte
> +		 text as single characters.  */
> +	      if (coding->src_multibyte
> +		  && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> +		{
> +		  c = STRING_CHAR_ADVANCE (src);
> +		  nbytes--;
> +		}
> +	      else
> +		{
> +		  c = *src++;
> +
> +		  if (c & 0x80)
> +		    c = BYTE8_TO_CHAR (c);
> +		}
>  	      coding->charbuf[coding->charbuf_used++] = c;
>  	    }
>  	  produce_chars (coding, Qnil, 1);
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Fri, 01 Dec 2017 08:43:02 GMT) Full text and rfc822 format available.

Message #59 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: handa <at> gnu.org
Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Fri, 01 Dec 2017 10:41:47 +0200
Ping!  Ping!  I'd _really_ like to fix this for Emacs 26, but I'm
bothered by the potential adverse consequences of making changes in
such a central piece of code.

Still hoping to get comments from Handa-san.

> Date: Fri, 24 Nov 2017 10:52:19 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> 
> Ping!  Kenichi, any comments on this issue or the proposed patch?
> 
> > Date: Mon, 20 Nov 2017 20:15:55 +0200
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> > 
> > > From: Andreas Schwab <schwab <at> suse.de>
> > > Cc: Kenichi Handa <handa <at> gnu.org>,  vianchielfaura <at> gmail.com,  29189 <at> debbugs.gnu.org
> > > Date: Mon, 20 Nov 2017 10:48:09 +0100
> > > 
> > > > +	      /* Copy raw bytes in their 2-byte forms as single characters.  */
> > > > +	      if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > > +		{
> > > > +		  c = STRING_CHAR_ADVANCE (src);
> > > 
> > > CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> > > strings.  I don't think it makes sense to use them for unibyte strings.
> > 
> > Right you are, thanks.  Updated patch below.
> > 
> > diff --git a/src/coding.c b/src/coding.c
> > index d790ad0..ac55f87 100644
> > --- a/src/coding.c
> > +++ b/src/coding.c
> > @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
> >  
> >  	  while (nbytes-- > 0)
> >  	    {
> > -	      int c = *src++;
> > +	      int c;
> >  
> > -	      if (c & 0x80)
> > -		c = BYTE8_TO_CHAR (c);
> > +	      /* Copy raw bytes in their 2-byte forms from multibyte
> > +		 text as single characters.  */
> > +	      if (coding->src_multibyte
> > +		  && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > +		{
> > +		  c = STRING_CHAR_ADVANCE (src);
> > +		  nbytes--;
> > +		}
> > +	      else
> > +		{
> > +		  c = *src++;
> > +
> > +		  if (c & 0x80)
> > +		    c = BYTE8_TO_CHAR (c);
> > +		}
> >  	      coding->charbuf[coding->charbuf_used++] = c;
> >  	    }
> >  	  produce_chars (coding, Qnil, 1);
> > 
> > 
> > 
> > 
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 02 Dec 2017 05:22:01 GMT) Full text and rfc822 format available.

Message #62 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Allen Li <vianchielfaura <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Andreas Schwab <schwab <at> suse.de>, 29189 <at> debbugs.gnu.org, handa <at> gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Fri, 1 Dec 2017 21:21:14 -0800
On Mon, Nov 20, 2017 at 10:15 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> Right you are, thanks.  Updated patch below.
>
> diff --git a/src/coding.c b/src/coding.c
> index d790ad0..ac55f87 100644
> --- a/src/coding.c
> +++ b/src/coding.c
> @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
>
>           while (nbytes-- > 0)
>             {
> -             int c = *src++;
> +             int c;
>
> -             if (c & 0x80)
> -               c = BYTE8_TO_CHAR (c);
> +             /* Copy raw bytes in their 2-byte forms from multibyte
> +                text as single characters.  */
> +             if (coding->src_multibyte
> +                 && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> +               {
> +                 c = STRING_CHAR_ADVANCE (src);
> +                 nbytes--;
> +               }
> +             else
> +               {
> +                 c = *src++;
> +
> +                 if (c & 0x80)
> +                   c = BYTE8_TO_CHAR (c);
> +               }
>               coding->charbuf[coding->charbuf_used++] = c;
>             }
>           produce_chars (coding, Qnil, 1);

I applied this patch to master (0b6f4f2c60) and it fixes the bug and
doesn't crash Emacs immediately.  The code also looks right, but I am
not familiar with Emacs's C code.  A few questions.

Why do we have to handle multibyte strings in this function
decode_coding?  (I found the answer in the docs)

Can you briefly explain how Emacs internally stores unibyte and
multibyte strings?  (I found the answer in character.h)

After doing the above research, I can more confidently say this is
right, but having an expert opinion would be nice.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 02 Dec 2017 09:02:02 GMT) Full text and rfc822 format available.

Message #65 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Allen Li <vianchielfaura <at> gmail.com>
Cc: schwab <at> suse.de, 29189 <at> debbugs.gnu.org, handa <at> gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sat, 02 Dec 2017 11:01:01 +0200
> From: Allen Li <vianchielfaura <at> gmail.com>
> Date: Fri, 1 Dec 2017 21:21:14 -0800
> Cc: Andreas Schwab <schwab <at> suse.de>, handa <at> gnu.org, 29189 <at> debbugs.gnu.org
> 
> I applied this patch to master (0b6f4f2c60) and it fixes the bug and
> doesn't crash Emacs immediately.  The code also looks right, but I am
> not familiar with Emacs's C code.  A few questions.
> 
> Why do we have to handle multibyte strings in this function
> decode_coding?  (I found the answer in the docs)

decode_coding needs to work when a series of raw bytes is inserted
into a multibyte buffer (which happens in the Dired case).

> Can you briefly explain how Emacs internally stores unibyte and
> multibyte strings?  (I found the answer in character.h)

Right, the details are in that header file.

> After doing the above research, I can more confidently say this is
> right, but having an expert opinion would be nice.

Thanks for proofreading the patch and testing it.  I'm waiting for
Handa-san to comment on it.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 09 Dec 2017 09:05:02 GMT) Full text and rfc822 format available.

Message #68 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: handa <at> gnu.org
Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sat, 09 Dec 2017 11:03:57 +0200
Ping!  Ping!  Ping!

> Date: Fri, 01 Dec 2017 10:41:47 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> 
> Ping!  Ping!  I'd _really_ like to fix this for Emacs 26, but I'm
> bothered by the potential adverse consequences of making changes in
> such a central piece of code.
> 
> Still hoping to get comments from Handa-san.
> 
> > Date: Fri, 24 Nov 2017 10:52:19 +0200
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> > 
> > Ping!  Kenichi, any comments on this issue or the proposed patch?
> > 
> > > Date: Mon, 20 Nov 2017 20:15:55 +0200
> > > From: Eli Zaretskii <eliz <at> gnu.org>
> > > Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> > > 
> > > > From: Andreas Schwab <schwab <at> suse.de>
> > > > Cc: Kenichi Handa <handa <at> gnu.org>,  vianchielfaura <at> gmail.com,  29189 <at> debbugs.gnu.org
> > > > Date: Mon, 20 Nov 2017 10:48:09 +0100
> > > > 
> > > > > +	      /* Copy raw bytes in their 2-byte forms as single characters.  */
> > > > > +	      if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > > > +		{
> > > > > +		  c = STRING_CHAR_ADVANCE (src);
> > > > 
> > > > CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> > > > strings.  I don't think it makes sense to use them for unibyte strings.
> > > 
> > > Right you are, thanks.  Updated patch below.
> > > 
> > > diff --git a/src/coding.c b/src/coding.c
> > > index d790ad0..ac55f87 100644
> > > --- a/src/coding.c
> > > +++ b/src/coding.c
> > > @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
> > >  
> > >  	  while (nbytes-- > 0)
> > >  	    {
> > > -	      int c = *src++;
> > > +	      int c;
> > >  
> > > -	      if (c & 0x80)
> > > -		c = BYTE8_TO_CHAR (c);
> > > +	      /* Copy raw bytes in their 2-byte forms from multibyte
> > > +		 text as single characters.  */
> > > +	      if (coding->src_multibyte
> > > +		  && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > +		{
> > > +		  c = STRING_CHAR_ADVANCE (src);
> > > +		  nbytes--;
> > > +		}
> > > +	      else
> > > +		{
> > > +		  c = *src++;
> > > +
> > > +		  if (c & 0x80)
> > > +		    c = BYTE8_TO_CHAR (c);
> > > +		}
> > >  	      coding->charbuf[coding->charbuf_used++] = c;
> > >  	    }
> > >  	  produce_chars (coding, Qnil, 1);
> > > 
> > > 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Fri, 15 Dec 2017 09:10:03 GMT) Full text and rfc822 format available.

Message #71 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: handa <at> gnu.org
Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Fri, 15 Dec 2017 11:09:48 +0200
Ping!  Ping!  Ping!  Ping!

> Date: Sat, 09 Dec 2017 11:03:57 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> 
> Ping!  Ping!  Ping!
> 
> > Date: Fri, 01 Dec 2017 10:41:47 +0200
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> > 
> > Ping!  Ping!  I'd _really_ like to fix this for Emacs 26, but I'm
> > bothered by the potential adverse consequences of making changes in
> > such a central piece of code.
> > 
> > Still hoping to get comments from Handa-san.
> > 
> > > Date: Fri, 24 Nov 2017 10:52:19 +0200
> > > From: Eli Zaretskii <eliz <at> gnu.org>
> > > Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> > > 
> > > Ping!  Kenichi, any comments on this issue or the proposed patch?
> > > 
> > > > Date: Mon, 20 Nov 2017 20:15:55 +0200
> > > > From: Eli Zaretskii <eliz <at> gnu.org>
> > > > Cc: vianchielfaura <at> gmail.com, 29189 <at> debbugs.gnu.org
> > > > 
> > > > > From: Andreas Schwab <schwab <at> suse.de>
> > > > > Cc: Kenichi Handa <handa <at> gnu.org>,  vianchielfaura <at> gmail.com,  29189 <at> debbugs.gnu.org
> > > > > Date: Mon, 20 Nov 2017 10:48:09 +0100
> > > > > 
> > > > > > +	      /* Copy raw bytes in their 2-byte forms as single characters.  */
> > > > > > +	      if (CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > > > > +		{
> > > > > > +		  c = STRING_CHAR_ADVANCE (src);
> > > > > 
> > > > > CHAR_BYTE8_HEAD_P and STRING_CHAR_ADVANCE are only valid for multibyte
> > > > > strings.  I don't think it makes sense to use them for unibyte strings.
> > > > 
> > > > Right you are, thanks.  Updated patch below.
> > > > 
> > > > diff --git a/src/coding.c b/src/coding.c
> > > > index d790ad0..ac55f87 100644
> > > > --- a/src/coding.c
> > > > +++ b/src/coding.c
> > > > @@ -7423,10 +7423,23 @@ decode_coding (struct coding_system *coding)
> > > >  
> > > >  	  while (nbytes-- > 0)
> > > >  	    {
> > > > -	      int c = *src++;
> > > > +	      int c;
> > > >  
> > > > -	      if (c & 0x80)
> > > > -		c = BYTE8_TO_CHAR (c);
> > > > +	      /* Copy raw bytes in their 2-byte forms from multibyte
> > > > +		 text as single characters.  */
> > > > +	      if (coding->src_multibyte
> > > > +		  && CHAR_BYTE8_HEAD_P (*src) && nbytes > 0)
> > > > +		{
> > > > +		  c = STRING_CHAR_ADVANCE (src);
> > > > +		  nbytes--;
> > > > +		}
> > > > +	      else
> > > > +		{
> > > > +		  c = *src++;
> > > > +
> > > > +		  if (c & 0x80)
> > > > +		    c = BYTE8_TO_CHAR (c);
> > > > +		}
> > > >  	      coding->charbuf[coding->charbuf_used++] = c;
> > > >  	    }
> > > >  	  produce_chars (coding, Qnil, 1);
> > > > 
> > > > 
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Fri, 05 Jan 2018 22:17:02 GMT) Full text and rfc822 format available.

Message #74 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> gnu.org>, 29189 <at> debbugs.gnu.org, schwab <at> suse.de,
 vianchielfaura <at> gmail.com
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Fri, 05 Jan 2018 17:16:30 -0500
> I found that the alternative patch below solves the original problem
> without any changes needed in files.el, and without introducing any
> performance hits.  Does anyone see a problem with this proposed patch?

[ With Andreas's adjustment]  It looks sane to me.

Decoding applied to multibyte text is a rather odd situation (tho I'm
surprised this problem hasn't been noticed until now).
It should very much come with a few tests to verify that
(decode-coding-string (string-to-multibyte (encode-coding-string X)))
is just a more expensive alternative to
(decode-coding-string (encode-coding-string X))).

I'd also be tempted to additionally signal an error if a non-byte
(i.e. a char that's neither ASCII nor eight-bit-byte) is found since
"decoding" in that case is meaningless.  Tho this is obviously not for
the emacs-26 branch.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sat, 06 Jan 2018 16:06:01 GMT) Full text and rfc822 format available.

Message #77 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: handa <at> gnu.org, 29189 <at> debbugs.gnu.org, schwab <at> suse.de,
 vianchielfaura <at> gmail.com
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sat, 06 Jan 2018 18:04:56 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: handa <at> gnu.org,  vianchielfaura <at> gmail.com,  29189-don <at> debbugs.gnu.org,  schwab <at> suse.de
> Date: Sat, 06 Jan 2018 10:20:38 -0500
> 
> > Situations where file names are not valid byte sequences for the
> > locale's codeset are rare.
> 
> Hmm... then I think I have misunderstood something: why doesn't this
> problem show up with a valid name like "λ" ?

Because "λ" will be inserted by 'ls' as a valid UTF-8 sequence of raw
bytes, and will be correctly decoded.  By contrast, \265 is not a
valid UTF-8 sequence, so we need it to produce a string of a single
raw byte.

> >> I'd also be tempted to additionally signal an error if a non-byte
> >> (i.e. a char that's neither ASCII nor eight-bit-byte) is found since
> >> "decoding" in that case is meaningless.
> > I don't think I understand.  A given byte sequence can either
> > represent a decodable character or be a raw byte.  What third
> > possibility did you have in mind?
> 
> If the input is from a multibyte text, the input is not a byte sequence
> but a character sequence, so the third possibility is to have a non-byte
> in those characters.

Input that is from multibyte text can include raw bytes in their
multibyte representation.  What is a "non-byte"?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sun, 07 Jan 2018 15:21:02 GMT) Full text and rfc822 format available.

Message #80 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: handa <at> gnu.org, 29189 <at> debbugs.gnu.org, schwab <at> suse.de,
 vianchielfaura <at> gmail.com
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sun, 07 Jan 2018 10:20:06 -0500
> Input that is from multibyte text can include raw bytes in their
> multibyte representation.  What is a "non-byte"?

`λ`?  `é`?  `²`?


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sun, 07 Jan 2018 17:55:02 GMT) Full text and rfc822 format available.

Message #83 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: handa <at> gnu.org, 29189 <at> debbugs.gnu.org, schwab <at> suse.de,
 vianchielfaura <at> gmail.com
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sun, 07 Jan 2018 19:53:33 +0200
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: handa <at> gnu.org,  vianchielfaura <at> gmail.com,  29189 <at> debbugs.gnu.org,  schwab <at> suse.de
> Date: Sun, 07 Jan 2018 10:20:06 -0500
> 
> > Input that is from multibyte text can include raw bytes in their
> > multibyte representation.  What is a "non-byte"?
> 
> `λ`?  `é`?  `²`?

These are characters, so they will never end up in the code fragment
that was the subject of this big, because that code deals specifically
with the unprocessed tail of byte stream that could not be decoded.
They will be decoded with the rest of the text before we get to the
code being discussed.

And even if you would like to add there assertion for "cannot happen"
stuff, how would you identify such "non-bytes"?  Their byte sequences
depend on the original encoding, so detecting them sounds like your
favorite Turing stopping problem, no?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29189; Package emacs. (Sun, 09 Sep 2018 00:32:01 GMT) Full text and rfc822 format available.

Message #86 received at 29189 <at> debbugs.gnu.org (full text, mbox):

From: Allen Li <darkfeline <at> felesatra.moe>
To: 29189 <at> debbugs.gnu.org
Subject: Re: 25.3; Dired does not work with binary filenames
Date: Sat, 8 Sep 2018 17:31:14 -0700
I believe this bug was fixed in 26?




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sun, 09 Sep 2018 06:13:02 GMT) Full text and rfc822 format available.

Notification sent to Allen Li <vianchielfaura <at> gmail.com>:
bug acknowledged by developer. (Sun, 09 Sep 2018 06:13:02 GMT) Full text and rfc822 format available.

Message #91 received at 29189-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Allen Li <darkfeline <at> felesatra.moe>
Cc: 29189-done <at> debbugs.gnu.org
Subject: Re: bug#29189: 25.3; Dired does not work with binary filenames
Date: Sun, 09 Sep 2018 09:12:28 +0300
> From: Allen Li <darkfeline <at> felesatra.moe>
> Date: Sat, 8 Sep 2018 17:31:14 -0700
> 
> I believe this bug was fixed in 26?

Yes.  I even sent a message to that effect at the time I pushed the
changes, but I see now that I goofed with the bug address, so neither
the message nor the instruction to close the bug made it to the bug
tracker.  Reproducing the important part below:

   > From: Stefan Monnier <monnier <at> iro.umontreal.ca>
   > Cc: Kenichi Handa <handa <at> gnu.org>,  vianchielfaura <at> gmail.com,  29189 <at> debbugs.gnu.org,  schwab <at> suse.de
   > Date: Fri, 05 Jan 2018 17:16:30 -0500
   >
   > > I found that the alternative patch below solves the original problem
   > > without any changes needed in files.el, and without introducing any
   > > performance hits.  Does anyone see a problem with this proposed patch?
   >
   > [ With Andreas's adjustment]  It looks sane to me.

   I'd still want Handa-san's review of the patch, but I went ahead and
   pushed it to the emacs-26 branch, and I'm closing the bug.

Thanks for pointing out this blunder.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 07 Oct 2018 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 261 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.