From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 09 Jul 2022 17:46:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 56469@debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.165738872124665 (code B ref -1); Sat, 09 Jul 2022 17:46:01 +0000 Received: (at submit) by debbugs.gnu.org; 9 Jul 2022 17:45:21 +0000 Received: from localhost ([127.0.0.1]:35860 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAEW8-0006Pl-NS for submit@debbugs.gnu.org; Sat, 09 Jul 2022 13:45:21 -0400 Received: from lists.gnu.org ([209.51.188.17]:59458) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAEW6-0006Pd-En for submit@debbugs.gnu.org; Sat, 09 Jul 2022 13:45:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60654) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAEW3-0006MA-QD for bug-gnu-emacs@gnu.org; Sat, 09 Jul 2022 13:45:18 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:45963) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAEW0-0005bk-UP for bug-gnu-emacs@gnu.org; Sat, 09 Jul 2022 13:45:14 -0400 Received: from pmg1.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id 42D05100182 for ; Sat, 9 Jul 2022 13:45:10 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id 752B410012B for ; Sat, 9 Jul 2022 13:45:04 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1657388704; bh=ADZsJdqYhkryUf4oGzhRT0jk+3ogCDsQVs6wL90dkN0=; h=From:To:Subject:Date:From; b=K5Ot9bninuciVARa3mhu/gDmmvVaKKLX6lb7XWcnty0R6MpVY6vnBleKWThDx3Isb vsFY4ZcMOPiU1HiBuptZSgLPiNc4ciwq/2A7op1W+zfzfCJbHOG1no02rxXzPyGnOR 0mw+ZepCDstuzoXCTbM3jPLDKhWX9wWNBONyeiqB/iBBfqFiuKMvhY2xThxm2H3qOM KtcZhOITv9o4XO1IrSE2TK2JMiwCRDr2pf5ZzUTHMFsZAsLA0hNJd9BFgm4eWBBQOQ LWT1Ag8H/cS5+SrFFiJyTDk6dYA4iZUGCt3JHfqvm0UgcWmVIo0b6ULI9aPkcLmhm+ LcdTKLV+WSDKQ== Received: from pastel (unknown [45.72.196.165]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 1DF7F1204F3 for ; Sat, 9 Jul 2022 13:45:04 -0400 (EDT) From: Stefan Monnier Date: Sat, 09 Jul 2022 13:44:52 -0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL -0.045 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Package: Emacs Version: 29.0.50 If you have a directory named "/tmp/\303a" with a file named "f=E9e" inside, then (directory-files "/tmp/\303a" 'full) is likely to return a funny string which is multibyte but contains an invalid utf-8 sequence (its bytes spell "/tmp/\303a/f\303\251e"). That strings seems to be printed as "/tmp/=A1/f=E9e" which corresponds to "/tmp/\303\241/f\303\251e". Such a string with an invalid UTF-8 sequence is handled quite graciously by Emacs, so I wasn't able to get an actual crash out of it, but it's still something we should avoid. I suggest the patch below. In a comment I suggest we don't try to use unibyte strings when a multibyte string would work as well. This is because for those ASCII-only strings, it's cheaper to test bytes=3D=3Dchars to (re)discover that they are ASCII-only (when they're multibyte) than having to loop through the bytes (when they're unibyte). Stefan diff --git a/src/dired.c b/src/dired.c index 6bb8c2fcb9f..33ddfafd8e7 100644 --- a/src/dired.c +++ b/src/dired.c @@ -219,6 +219,13 @@ directory_files_internal (Lisp_Object directory, Lisp_= Object full, } #endif =20 + if (!NILP (full) && !STRING_MULTIBYTE (directory)) + { /* We will be concatenating 'directory' with local file name. + We always decode local file names, so in order to safely concaten= ate + them we need 'directory' to be multibyte. */ + directory =3D Fstring_to_multibyte (directory); + } + ptrdiff_t directory_nbytes =3D SBYTES (directory); re_match_object =3D Qt; =20 @@ -263,9 +270,10 @@ directory_files_internal (Lisp_Object directory, Lisp_= Object full, ptrdiff_t name_nbytes =3D SBYTES (name); ptrdiff_t nbytes =3D directory_nbytes + needsep + name_nbytes; ptrdiff_t nchars =3D SCHARS (directory) + needsep + SCHARS (name); - finalname =3D make_uninit_multibyte_string (nchars, nbytes); - if (nchars =3D=3D nbytes) - STRING_SET_UNIBYTE (finalname); + /* FIXME: Why not make them all multibyte? */ + finalname =3D (nchars =3D=3D nbytes) + ? make_uninit_string (nchars, nbytes) + : make_uninit_multibyte_string (nchars, nbytes); memcpy (SDATA (finalname), SDATA (directory), directory_nbytes); if (needsep) SSET (finalname, directory_nbytes, DIRECTORY_SEP); From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 09 Jul 2022 18:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Stefan Monnier Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.16573906663697 (code B ref 56469); Sat, 09 Jul 2022 18:18:02 +0000 Received: (at 56469) by debbugs.gnu.org; 9 Jul 2022 18:17:46 +0000 Received: from localhost ([127.0.0.1]:35896 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAF1W-0000xY-6v for submit@debbugs.gnu.org; Sat, 09 Jul 2022 14:17:46 -0400 Received: from eggs.gnu.org ([209.51.188.92]:51592) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAF1U-0000xL-Da for 56469@debbugs.gnu.org; Sat, 09 Jul 2022 14:17:44 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:48282) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAF1O-0001Ar-KU; Sat, 09 Jul 2022 14:17:38 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=RfwzcQKeoXWbvp7Rbd/zo9oto7d1hhInTQEZC1XINeU=; b=KVWZNO2C3MuNA8928FQt CwX6HVblbInulsStmUhcsfm7lR2QaDreTgsxfWzd2SJ3UcNnajS/JlXpqH5gHRcs1a607k8uBE9tD 8aU3z/44QchYRjKIuADowXuNKH8UtC7IjjV+aHvfeifj4F7ImaYNsdl8l/gN/WnxO6zekOZO7kDo0 QIadFoxncG5rFahRuSgX/dDGMbMdSVtVnI26AMrFWFgZW0lmJ5hhrJjdq4tVMC4F/G79Sc6t8ubWl A1gwQopWKNWjQdcOG15SyhCvVj1k2Adyld1VHM10+PLOxQybRJe6b+GKqLcnSK4gff0hT2frrMFSE WN4HONaLs0wC3Q==; Received: from [87.69.77.57] (port=3563 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAF1O-0003sY-3d; Sat, 09 Jul 2022 14:17:38 -0400 Date: Sat, 09 Jul 2022 21:17:22 +0300 Message-Id: <83y1x2177x.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: (bug-gnu-emacs@gnu.org) References: MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Sat, 09 Jul 2022 13:44:52 -0400 > From: Stefan Monnier via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > If you have a directory named "/tmp/\303a" with a file named "fée" > inside, then (directory-files "/tmp/\303a" 'full) is likely to return > a funny string which is multibyte but contains an invalid > utf-8 sequence (its bytes spell "/tmp/\303a/f\303\251e"). > That strings seems to be printed as "/tmp/¡/fée" which corresponds > to "/tmp/\303\241/f\303\251e". > > Such a string with an invalid UTF-8 sequence is handled quite graciously > by Emacs, so I wasn't able to get an actual crash out of it, but it's > still something we should avoid. > > I suggest the patch below. In a comment I suggest we don't try to use > unibyte strings when a multibyte string would work as well. This is > because for those ASCII-only strings, it's cheaper to test bytes==chars > to (re)discover that they are ASCII-only (when they're multibyte) than > having to loop through the bytes (when they're unibyte). Please bootstrap Emacs in a directory with such a name, and if that works, I'm okay with installing this change. Thanks. From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 09 Jul 2022 18:21:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.16573908523984 (code B ref 56469); Sat, 09 Jul 2022 18:21:02 +0000 Received: (at 56469) by debbugs.gnu.org; 9 Jul 2022 18:20:52 +0000 Received: from localhost ([127.0.0.1]:35901 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAF4V-00012C-MD for submit@debbugs.gnu.org; Sat, 09 Jul 2022 14:20:51 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:14466) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAF4T-00011y-AB for 56469@debbugs.gnu.org; Sat, 09 Jul 2022 14:20:49 -0400 Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id EA5444406EF; Sat, 9 Jul 2022 14:20:43 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 98D03440762; Sat, 9 Jul 2022 14:20:38 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1657390838; bh=XD5aSCaObo70++FnvX5uHwFuu6CyBOnVW6l9GZOxrXE=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=hljd+cqVbMXs6iYvcMXRiZnSLHYvvYZmp0tVJbQ2iqyrsKUsgqeRsBkjwtSOYCtCG lgf39XZNoijNgjwgdAT2GU0r1qh06TzHzFEq7sb5Lf3/PGFW+rXuHq7tQjbK/n9DVC GC9tNAMS8NYo6Fy3DTfX2zxXQ1FZoj9VCy6bFQ/KRnyRReOvzyEHzuzyc8Ej1BNPRN ByuaM8r5LfcEaYDGDgVrR47z7jCbB5U0zKrdu0R+a5/SLMzfCUoeDbe+dnExbrz198 jkPS0aXCuFQAFb984eqCdXwJ0WBuQ3ZAngrgP2Ht25W6MaH5g9r0gooNnbyD0WDUzq 5J2Hhw/E1J4gQ== Received: from pastel (unknown [45.72.196.165]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 6C192120192; Sat, 9 Jul 2022 14:20:38 -0400 (EDT) From: Stefan Monnier In-Reply-To: <83y1x2177x.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 09 Jul 2022 21:17:22 +0300") Message-ID: References: <83y1x2177x.fsf@gnu.org> Date: Sat, 09 Jul 2022 14:20:37 -0400 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL -0.062 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) >> I suggest the patch below. In a comment I suggest we don't try to use >> unibyte strings when a multibyte string would work as well. This is >> because for those ASCII-only strings, it's cheaper to test bytes==chars >> to (re)discover that they are ASCII-only (when they're multibyte) than >> having to loop through the bytes (when they're unibyte). > > Please bootstrap Emacs in a directory with such a name, and if that > works, I'm okay with installing this change. Just to clarify: by "this change" you refer to the change in the patch or the change suggested in the comment? Stefan From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 09 Jul 2022 18:54:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Stefan Monnier Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.16573928307130 (code B ref 56469); Sat, 09 Jul 2022 18:54:01 +0000 Received: (at 56469) by debbugs.gnu.org; 9 Jul 2022 18:53:50 +0000 Received: from localhost ([127.0.0.1]:35916 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAFaP-0001qw-Rt for submit@debbugs.gnu.org; Sat, 09 Jul 2022 14:53:50 -0400 Received: from eggs.gnu.org ([209.51.188.92]:57386) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAFaN-0001qk-W0 for 56469@debbugs.gnu.org; Sat, 09 Jul 2022 14:53:48 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:48456) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAFaI-0003Ic-0x; Sat, 09 Jul 2022 14:53:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=rOyy9p9LXUBUf3vrvG0cgncvzVaf4mRkHhd/Qjlqnho=; b=Q+0tQVGn8TeK 6wKEKsQ5lLPo/CDi2ohA4iUEMSrG00lGoUFkm/7pfyhdTzRYscPZFRSR7C0M2dH3gC2hOS/wwnYXd OI5S7PLdam6h4Mb/2T2sm7hmPX6AdIcBuxgzEFTr352czvffS6bf8AkrT28PmGz4o+l26L2+MlyxK RKgFZfoxEbulGxaXeW9gTguXwk0KXyperKfktHwdsvKMyCpuqv29ivvb+j6ZBcpDRgJHOdtAk1XIS rYlsccqjH8gcRNtEEDAevZLKvfOncsFs8CfWj/j8XDVairSP0IjCy7Z2QVJ1kuf7xoITWbDId3n5T uezeWHa9R1znW2s6vchqMw==; Received: from [87.69.77.57] (port=1778 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAFaH-0003r9-KH; Sat, 09 Jul 2022 14:53:41 -0400 Date: Sat, 09 Jul 2022 21:53:25 +0300 Message-Id: <83wncm15ju.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: (message from Stefan Monnier on Sat, 09 Jul 2022 14:20:37 -0400) References: <83y1x2177x.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Stefan Monnier > Cc: 56469@debbugs.gnu.org > Date: Sat, 09 Jul 2022 14:20:37 -0400 > > >> I suggest the patch below. In a comment I suggest we don't try to use > >> unibyte strings when a multibyte string would work as well. This is > >> because for those ASCII-only strings, it's cheaper to test bytes==chars > >> to (re)discover that they are ASCII-only (when they're multibyte) than > >> having to loop through the bytes (when they're unibyte). > > > > Please bootstrap Emacs in a directory with such a name, and if that > > works, I'm okay with installing this change. > > Just to clarify: by "this change" you refer to the change in the patch > or the change suggested in the comment? I meant the patch. The comment I didn't understand at all. It seemed to be unrelated to the code and the change you were proposing. From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 09 22:23:20 2022 Received: (at control) by debbugs.gnu.org; 10 Jul 2022 02:23:20 +0000 Received: from localhost ([127.0.0.1]:36086 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAMbP-0006uh-RJ for submit@debbugs.gnu.org; Sat, 09 Jul 2022 22:23:19 -0400 Received: from mail-pf1-f178.google.com ([209.85.210.178]:41656) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAMbP-0006uT-29 for control@debbugs.gnu.org; Sat, 09 Jul 2022 22:23:19 -0400 Received: by mail-pf1-f178.google.com with SMTP id l124so2131748pfl.8 for ; Sat, 09 Jul 2022 19:23:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:mime-version:date:message-id:subject:to; bh=QHYu6K3wn/Nj9tvxScedeTAnbPEFMysmjFu6si4ERBc=; b=pk9vopyvMaKL7+qs0rcv7vnEzeFSg0vgmg0ktsGFY4xcoQ8eY4mPn2m7+fSGn4RglP J6wD2uJrGjX3+RP/5h7Htw9qaRP/2zuKSvLvU8ZPKMeJ75MfOX+Pnm5M94SuxW2Ki7Yv 1tdvmO0L4QqegpoQ5UxtvB535gQHVqLY/cUnEAiitlN8kNRmfflNBnwxVm1cvXV2fuXZ 8yDluo3hD7nYSDJQuhVvTzIpnvwuLl2E3bLq/e6TSOLrR8uKI22O7mbCG37cnTe5gybv QZWLbImDr6COwMCn8+MYIQsRt3FUKWeNAQpszaWzFs7XK7XH4heX5am4SM6W4FukEujC ggKg== X-Gm-Message-State: AJIora/rU7kqLmwhlo0UusojhwEMqwPIIYy5K9PXvqwvCrAGfoQL9iDk HkrIF8veapkhg3Nhv00ZyUBtHgdU6vj1tjvpBfDZdx3E X-Google-Smtp-Source: AGRyM1t/rTBUBexSGFNxkg5r1FrmgcdfondvPtEwHHbmavlQsrv95h0J+RIi3uqMAJtIpgz7dIAIHPa5Mp9HHKppJfI= X-Received: by 2002:a05:6a00:b48:b0:525:348b:438b with SMTP id p8-20020a056a000b4800b00525348b438bmr11515972pfo.2.1657419793044; Sat, 09 Jul 2022 19:23:13 -0700 (PDT) Received: from 753933720722 named unknown by gmailapi.google.com with HTTPREST; Sat, 9 Jul 2022 19:23:12 -0700 From: Stefan Kangas MIME-Version: 1.0 Date: Sat, 9 Jul 2022 19:23:12 -0700 Message-ID: Subject: control message for bug #56469 To: control@debbugs.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) tags 56469 + patch quit From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Jul 2022 14:24:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Eli Zaretskii Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165746301914013 (code B ref 56469); Sun, 10 Jul 2022 14:24:01 +0000 Received: (at 56469) by debbugs.gnu.org; 10 Jul 2022 14:23:39 +0000 Received: from localhost ([127.0.0.1]:37681 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAXqU-0003dx-Vf for submit@debbugs.gnu.org; Sun, 10 Jul 2022 10:23:39 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:35896) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAXqT-0003dl-Nz for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 10:23:38 -0400 Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 222F58007C; Sun, 10 Jul 2022 10:23:32 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 81E1C8054F; Sun, 10 Jul 2022 10:23:30 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1657463010; bh=HySLdkK7/6sA3UeykHOExouM+o/EEeZ3jMvCBA7csvU=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=dcbgGxsD2KYrAtuWXIpz2gmtIIe9SDBfeZ2ftIupZNOcb6mw2O84dh24WfudJSVUd s186c+OVKa0euZC9RL64d9ChP0KCZahF5U2yS++xjF6w/UbWAKmLu9Yom7NHHIp50y HAHi2fgAtSTKr47K0MC6bbJ85xzSxjy+8fq8YFIx9GrsRCUnl83xxRt2+IXOM+IFP9 03vvLqdteyvhtideR1qE8zgaiN3+RkDZAo6ZOAXbAiYhe7pci9KNnpxYVBRo3pbY5h 1wEXKEzUopZK17PyDJ5/FfBIO0neQtUGYukyH4bEyM2jvhlUO3WHLOcmaMtTPX3E4z QFOL2fgo+gBCA== Received: from pastel (unknown [45.72.196.165]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 4A2DE120415; Sun, 10 Jul 2022 10:23:30 -0400 (EDT) From: Stefan Monnier In-Reply-To: <83y1x2177x.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 09 Jul 2022 21:17:22 +0300") Message-ID: References: <83y1x2177x.fsf@gnu.org> Date: Sun, 10 Jul 2022 10:23:28 -0400 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL -0.055 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Please bootstrap Emacs in a directory with such a name, and if that > works, I'm okay with installing this change. Pushed, thanks. W.r.t to the comment, it's indeed unrelated to the patch (other than the fact that it touches the same code). The question is when we do: finalname = (nchars == nbytes) ? make_uninit_string (nbytes) : make_uninit_multibyte_string (nchars, nbytes); the actual bytes are "decoded" (i.e. in our internal UTF-8 encoding), so (nchars == nbytes) checks whether its "pure ASCII" or not and if it's pure ASCII we return a unibyte string. Our file-name manipulation routines always consider unibyte-ASCII and multibyte-ASCII as "equivalent", and indeed DECODE_FILE and ENCODE_FILE take advantage of that so as to return their argument as-is when it's all-ASCII so as to avoid allocating a string unnecessarily. So in the above code snippet, when the string is all-ASCII, we actually have a choice, and both a unibyte string and a multibyte string should work. Currently in that case we return a unibyte string, but I think in such cases we're better off returning a multibyte string because the subsequent "all-ASCII" test (that DE/ENCODE_FILE will perform when we pass that filename to some further operation) will be more efficient (it's a constant-time (nchars == nbytes) test whereas when the string is unibyte it requires looking at each and every byte). IOW, while it makes sense to return a "decoded unibyte" string from DECODE_FILE in order to avoid an allocation, I don't think it makes sense to return such a "decoded unibyte" string when we have to allocate a new string anyway. Stefan From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Jul 2022 14:33:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Stefan Monnier Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165746356314979 (code B ref 56469); Sun, 10 Jul 2022 14:33:02 +0000 Received: (at 56469) by debbugs.gnu.org; 10 Jul 2022 14:32:43 +0000 Received: from localhost ([127.0.0.1]:37694 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAXzH-0003tX-2n for submit@debbugs.gnu.org; Sun, 10 Jul 2022 10:32:43 -0400 Received: from eggs.gnu.org ([209.51.188.92]:46282) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAXzF-0003tL-16 for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 10:32:41 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37098) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAXz9-0002sD-Os; Sun, 10 Jul 2022 10:32:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=buhzpHNZwFpUBzAFFFIN0Knxa+H5RTeWA2hjpx7jDKQ=; b=AsPx4CJ9VPOI MpWySNkJtHa5im+er8XKeOuwqH9iF/om4ssEf5zFxCMwrb588qjh+2ewOtJ03UcgvjbPRcKjFYcgz /3IYvS9SJlTpSHZ4BWe/c3O5fYs4nuCz9Qs9quJMv69qlFjKO9UDtUiv4S2gQpjjxGoDKyVRTSZ/8 pc1osCJ+p9kXBgME9B1kGTkICREYDqifjw40E3D5LUqA6nfuWmz/WD4zdA2vIUx/VkBzniLPUUXP9 wkZhKEcHFqSyqeua7BjsIcn0or/wqJJSziHDY7awzurmA3oq8ePE9JKtZPCu0GN9WpsTm3I3dfThX /PO1bFRtbfubG+yjVNq/7A==; Received: from [87.69.77.57] (port=2440 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAXz9-0007wO-4B; Sun, 10 Jul 2022 10:32:35 -0400 Date: Sun, 10 Jul 2022 17:32:17 +0300 Message-Id: <83bktx11ji.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: (message from Stefan Monnier on Sun, 10 Jul 2022 10:23:28 -0400) References: <83y1x2177x.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Stefan Monnier > Cc: 56469@debbugs.gnu.org > Date: Sun, 10 Jul 2022 10:23:28 -0400 > > W.r.t to the comment, it's indeed unrelated to the patch (other than > the fact that it touches the same code). The question is when we do: > > finalname = (nchars == nbytes) > ? make_uninit_string (nbytes) > : make_uninit_multibyte_string (nchars, nbytes); > > the actual bytes are "decoded" (i.e. in our internal UTF-8 encoding), so > (nchars == nbytes) checks whether its "pure ASCII" or not and if it's > pure ASCII we return a unibyte string. I don't think this is true, because early during startup we don't yet have the coding-systems set up, and so the file names are unibyte and undecoded. So that place in dired.c doesn't only handle ASCII when it sees that ncahrs == nbytes. > So in the above code snippet, when the string is all-ASCII, we actually > have a choice, and both a unibyte string and a multibyte string should > work. Currently in that case we return a unibyte string, but I think in > such cases we're better off returning a multibyte string because the > subsequent "all-ASCII" test (that DE/ENCODE_FILE will perform when we > pass that filename to some further operation) will be more efficient > (it's a constant-time (nchars == nbytes) test whereas when the string is > unibyte it requires looking at each and every byte). > > IOW, while it makes sense to return a "decoded unibyte" string from > DECODE_FILE in order to avoid an allocation, I don't think it makes > sense to return such a "decoded unibyte" string when we have to allocate > a new string anyway. I'm not necessarily opposed to decide that ASCII strings should be multibyte, but doing so for file names will need careful auditing of the sources with the startup process in mind. From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Jul 2022 14:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Eli Zaretskii Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165746512417547 (code B ref 56469); Sun, 10 Jul 2022 14:59:02 +0000 Received: (at 56469) by debbugs.gnu.org; 10 Jul 2022 14:58:44 +0000 Received: from localhost ([127.0.0.1]:37699 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAYOS-0004Yw-9a for submit@debbugs.gnu.org; Sun, 10 Jul 2022 10:58:44 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:41502) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAYOR-0004Yj-Cr for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 10:58:43 -0400 Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id EC6B6440980; Sun, 10 Jul 2022 10:58:37 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id B602C4408F1; Sun, 10 Jul 2022 10:58:36 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1657465116; bh=5aHDkmUuDuuOVAV8Vm/V8Xzt3No1WGASWl0GY6m1MW0=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=KkkpWiq+L6GCpyTibV99lK+5H8MdDbfyS+StZoY3JcW1OfN7tK5xL7X96upSjvSqy EeLKfX1V4loA6dXgBZ6u8apOc6eQlxNmQreW9xCKifBIjJty2hSDNcDiRybF9f4xmE qKsyNCHnqVosnc5JM+b8m4+QS80nYZX0rxoJyrbMHBnu+LRGnTZ7mlPqwvx7nOfDx6 czCzDs15fFXlyY1pCDx9xIZLWmVENlVcKhffdsnsnBCYkBGoWRpM69aE5eMERIj8rQ yyZfId9qWbOjzQ+yDUisiazKNFQJc3QQWETvJLcCKmvr/oiHkNELi65d2VFkZxxHcl M2mlEoGdELeLg== Received: from pastel (unknown [45.72.196.165]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 84483120415; Sun, 10 Jul 2022 10:58:36 -0400 (EDT) From: Stefan Monnier In-Reply-To: <83bktx11ji.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 10 Jul 2022 17:32:17 +0300") Message-ID: References: <83y1x2177x.fsf@gnu.org> <83bktx11ji.fsf@gnu.org> Date: Sun, 10 Jul 2022 10:58:30 -0400 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL -0.062 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii [2022-07-10 17:32:17] wrote: >> From: Stefan Monnier >> Cc: 56469@debbugs.gnu.org >> Date: Sun, 10 Jul 2022 10:23:28 -0400 >> >> W.r.t to the comment, it's indeed unrelated to the patch (other than >> the fact that it touches the same code). The question is when we do: >> >> finalname = (nchars == nbytes) >> ? make_uninit_string (nbytes) >> : make_uninit_multibyte_string (nchars, nbytes); >> >> the actual bytes are "decoded" (i.e. in our internal UTF-8 encoding), so >> (nchars == nbytes) checks whether its "pure ASCII" or not and if it's >> pure ASCII we return a unibyte string. > > I don't think this is true, because early during startup we don't yet > have the coding-systems set up, and so the file names are unibyte and > undecoded. So that place in dired.c doesn't only handle ASCII when it > sees that ncahrs == nbytes. Hmm... the early startup is actually not a worry here (according to my tests `directory_files_internal` is first called when we get to native-compile the macroexp/bytecomp, at which point all our coding systems have been setup). But indeed, if the file name coding system is something like `binary`, DECODE_FILE will always return a unibyte string, so we may have non-ASCII bytes when (nchars == nbytes). Thanks, I'll update the comment accordingly. Stefan From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Jul 2022 15:08:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Stefan Monnier Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165746566718484 (code B ref 56469); Sun, 10 Jul 2022 15:08:02 +0000 Received: (at 56469) by debbugs.gnu.org; 10 Jul 2022 15:07:47 +0000 Received: from localhost ([127.0.0.1]:37704 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAYXD-0004o4-69 for submit@debbugs.gnu.org; Sun, 10 Jul 2022 11:07:47 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50628) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAYXB-0004nq-BC for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 11:07:45 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37260) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAYX6-0004YR-17; Sun, 10 Jul 2022 11:07:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=/eszlao97Yump3Y/MDHZODdccfTbPspUy9ArS/UKEc4=; b=rfYS1koi1FQi eEogp488xZHA2QSt5fkiHNfdueNpB5D0ReIFO3w1YPnVVh8lJfxGfU4UKRt8nAS7+kE6aZbflavMa tIICPF8sCJrsMd16afWsjt9qPEXC8LXhs8kKjmIXqOQWIuNJmOMvm7i++RhogzM5EAz5mUoJJsxJe 0RP8/0GBweSt3QETsf0fAIc01TdAmzJjrOnx19CmfY0savUsl6Q3bF7TjdwUoZBCRcwiowkeW5up0 ZKXffUbjyRk1ZUtlFIIQAXyrZkLW39fLlrIQHttSFje/j0+r+j8igseOxJixbOgw2QCU8Qzypd5o1 4A9PA5c4GJn4vMBQ1doLIQ==; Received: from [87.69.77.57] (port=4577 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAYX5-0000fM-Gg; Sun, 10 Jul 2022 11:07:39 -0400 Date: Sun, 10 Jul 2022 18:07:24 +0300 Message-Id: <83a69h0zwz.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: (message from Stefan Monnier on Sun, 10 Jul 2022 10:58:30 -0400) References: <83y1x2177x.fsf@gnu.org> <83bktx11ji.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Stefan Monnier > Cc: 56469@debbugs.gnu.org > Date: Sun, 10 Jul 2022 10:58:30 -0400 > > Eli Zaretskii [2022-07-10 17:32:17] wrote: > > > I don't think this is true, because early during startup we don't yet > > have the coding-systems set up, and so the file names are unibyte and > > undecoded. So that place in dired.c doesn't only handle ASCII when it > > sees that ncahrs == nbytes. > > Hmm... the early startup is actually not a worry here (according to my > tests `directory_files_internal` is first called when we get to > native-compile the macroexp/bytecomp, at which point all our coding > systems have been setup). That could be the situation _today_, but that's just sheer luck (or lack thereof). In general, all the file-handling code we have in fileio.c and dired.c should be equally prepared to handle unibyte non-ASCII file names and multibyte file names, because we may need that any time. When we make changes in Emacs, we shouldn't be worried whether those changes could cause some dired.c code be called early on during Emacs startup. From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Jul 2022 15:20:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Eli Zaretskii Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165746637319584 (code B ref 56469); Sun, 10 Jul 2022 15:20:02 +0000 Received: (at 56469) by debbugs.gnu.org; 10 Jul 2022 15:19:33 +0000 Received: from localhost ([127.0.0.1]:37710 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAYib-00055n-9d for submit@debbugs.gnu.org; Sun, 10 Jul 2022 11:19:33 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:17704) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAYiZ-00055Z-Ll for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 11:19:32 -0400 Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id E6934440980; Sun, 10 Jul 2022 11:19:25 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 99C6144067E; Sun, 10 Jul 2022 11:19:24 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1657466364; bh=rfudo7eijaRdLG0ycM2MUOc9Mc0U1jLXWE7S93Cy7zA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=FynnPNMrm/1MAgGhXfIDCdhOEXh6J80NmNEQLM8ZgsgkVzPeCobKXKnXtxCAXhysq ZUxNPXLs14v9kYdOcW/OkrjNYd4NJCN+LUbS8If42pHjMmG57rQXncBClUpoICUmhS S0bAsgNzT4b8lJSs1f3WIaQRCip18wQw1O+vkBqv+RzUU0g6DaZ3SnWHWfL/b3Fi+P ihw1ZFsIrV8TD/LEGO37YgeQwOlplawZW/VAq7YjzoLuhiB0YvcyDluaMw/SA4qBqb dTcAR7aUgina9JkcdIM2DJgxnd9B86kIH0xLJQb3H2oQRe2BlCS9j8Efc3tWODq4YV 5a0fVtGu/24jQ== Received: from pastel (unknown [45.72.196.165]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 60904120506; Sun, 10 Jul 2022 11:19:24 -0400 (EDT) From: Stefan Monnier In-Reply-To: <83a69h0zwz.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 10 Jul 2022 18:07:24 +0300") Message-ID: References: <83y1x2177x.fsf@gnu.org> <83bktx11ji.fsf@gnu.org> <83a69h0zwz.fsf@gnu.org> Date: Sun, 10 Jul 2022 11:19:22 -0400 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL -0.062 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > That could be the situation _today_, but that's just sheer luck (or > lack thereof). In general, all the file-handling code we have in > fileio.c and dired.c should be equally prepared to handle unibyte > non-ASCII file names and multibyte file names, because we may need > that any time. When we make changes in Emacs, we shouldn't be worried > whether those changes could cause some dired.c code be called early on > during Emacs startup. Agreed. In the updated comment I noted that we have a bug when we do (let ((file-name-coding-system 'binary)) (directory-files "/tmp/=E9t=E9/" 'full) because we'll be concatenating the multibyte string "/tmp/=E9t=E9/" with the undecoded unibyte strings of the names of files in that directory. Stefan From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Jul 2022 15:43:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Stefan Monnier Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165746772121900 (code B ref 56469); Sun, 10 Jul 2022 15:43:01 +0000 Received: (at 56469) by debbugs.gnu.org; 10 Jul 2022 15:42:01 +0000 Received: from localhost ([127.0.0.1]:37722 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAZ4L-0005gy-ED for submit@debbugs.gnu.org; Sun, 10 Jul 2022 11:42:01 -0400 Received: from eggs.gnu.org ([209.51.188.92]:56048) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAZ4I-0005ge-E4 for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 11:41:59 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37550) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAZ4D-00036M-2t; Sun, 10 Jul 2022 11:41:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=z86f4ACdhQX2l+1rDlh9UHIfDZXDKv9ek6JpR0fUR34=; b=oa5q3hyasCNerycUoGU8 a43jaxnXalTHgTs0gcyv5YrjgAMJVsEtQVJKjT+mMclJWICeVulONF76KkyyHsHzQMcUM5OW1qdna JvFUWGiF/QERch8jxkA94oh8RZz5ieCLqSLNUXW1IkXj8ZIKII0s2wTOr1nvWstloPumtRAs5YQlK 5JKYIN52+d8CjZ9WDvWqJeq7Dg83emyciB/aDpZsw4hPyuvYoU2LmmvHZO3HVszGsmE3vijRKmlAE ZlVlHzRt3Bj7bvq//nq7XfUjru5oemP/rW8tAsrNwXsAB8cPqTgcY8jEs5y+I1L8eFoMKvwMLCSYK MUOJ94K5xLGxfA==; Received: from [87.69.77.57] (port=2696 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAZ4C-0000c3-JN; Sun, 10 Jul 2022 11:41:52 -0400 Date: Sun, 10 Jul 2022 18:41:38 +0300 Message-Id: <835yk50ybx.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: (message from Stefan Monnier on Sun, 10 Jul 2022 11:19:22 -0400) References: <83y1x2177x.fsf@gnu.org> <83bktx11ji.fsf@gnu.org> <83a69h0zwz.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Stefan Monnier > Cc: 56469@debbugs.gnu.org > Date: Sun, 10 Jul 2022 11:19:22 -0400 > > In the updated comment I noted that we have a bug when we do > > (let ((file-name-coding-system 'binary)) > (directory-files "/tmp/été/" 'full) > > because we'll be concatenating the multibyte string "/tmp/été/" with > the undecoded unibyte strings of the names of files in that directory. I don't think file-name related stuff can work in Emacs when file-name-coding-system is set to an arbitrary value not reflecting the reality. Why would we want to support such cases? (But I don't object to the comment.) From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Jul 2022 22:14:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Eli Zaretskii Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165749122829122 (code B ref 56469); Sun, 10 Jul 2022 22:14:01 +0000 Received: (at 56469) by debbugs.gnu.org; 10 Jul 2022 22:13:48 +0000 Received: from localhost ([127.0.0.1]:37897 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAfBU-0007Zd-K8 for submit@debbugs.gnu.org; Sun, 10 Jul 2022 18:13:48 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:49043) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAfBT-0007ZQ-3a for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 18:13:47 -0400 Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id BE4A8440FEB; Sun, 10 Jul 2022 18:13:41 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id A485C440980; Sun, 10 Jul 2022 18:13:40 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1657491220; bh=y8XKB5qLZIuAJ+V9epKaaWWgwKCHJ3+lXC+bIVcb0gA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=bkFVYueRbAlwgcYtLrhXzjQ56qDkWgGdeVMR+McGN352k9/df10R67rYSYR7VlCzn qH5k2qxIfKNnOcMSfLCgci/DA5wVTahiLUGUo+dQYtCNGlJ7oIF5gLeFBTCEwkEaM7 D4TBnfSpVNIVttQb7f8ESCLNDQIsa5x6jwUVR2xMetJTL6QAX61/6FLbbLXeGOfpjQ 1KPRQoDyBJsgrV/UUJD4ATjAH5OAADdkjcj5OvpfDFyOg8XmTiCCyI0N9oCyM6ZJBM 4tWzUcnip2ZMr0EZ7vdAbtWB0OKqbp0NJTwiOfMcyLOYEiAvFZ99IckKlJzhUQflAI 6Mzy64Y2yaXNQ== Received: from pastel (unknown [45.72.196.165]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 70B48120494; Sun, 10 Jul 2022 18:13:40 -0400 (EDT) From: Stefan Monnier In-Reply-To: <835yk50ybx.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 10 Jul 2022 18:41:38 +0300") Message-ID: References: <83y1x2177x.fsf@gnu.org> <83bktx11ji.fsf@gnu.org> <83a69h0zwz.fsf@gnu.org> <835yk50ybx.fsf@gnu.org> Date: Sun, 10 Jul 2022 18:13:39 -0400 User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL -0.062 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > I don't think file-name related stuff can work in Emacs when > file-name-coding-system is set to an arbitrary value not reflecting > the reality. I'd tend to agree (tho 'binary' does sound like a valid value which should work in all cases under GNU/Linux). I'm just a bit annoyed at the idea that ELisp code can end up constructing a multibyte string whose bytes contain invalid utf-8 sequences, because I suspect we may end up with a core dump somewhere in such a circumstance. Stefan From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 11 Jul 2022 02:29:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Stefan Monnier Cc: 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.165750648321780 (code B ref 56469); Mon, 11 Jul 2022 02:29:01 +0000 Received: (at 56469) by debbugs.gnu.org; 11 Jul 2022 02:28:03 +0000 Received: from localhost ([127.0.0.1]:38085 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAj9X-0005fD-38 for submit@debbugs.gnu.org; Sun, 10 Jul 2022 22:28:03 -0400 Received: from eggs.gnu.org ([209.51.188.92]:36684) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAj9U-0005ei-Sy for 56469@debbugs.gnu.org; Sun, 10 Jul 2022 22:28:01 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:45196) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAj9O-00011Q-I3; Sun, 10 Jul 2022 22:27:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=gmzQIookFSbeUPtkh5QDs/APL/7ijd7Thmh6zCnemsc=; b=m7U6jdPEzEhU 6GGaOFwPPCQOhxsou4CTwflUXnWEewJCC1rcPpJYUSWrqCMsF5lYN5eskwqvq5eYr8Z0DaUi1QNKg 6PHKmkrgM+SFslptuvCNIG5n6U02ZN9LRZdd44gnyxGQKSM3uM7m4bYs+GcthoD58Rgbl1zi2UlXk BrSe+yC8pNgjBcMXtmy9sXtp5XzCBpJXIsrKikmpuIqPZQ1FIqAo/A0UWA+LPqmgt/djGonmXTfF5 LEcH9hjXHM/gFl6NE2RRG1ZTd2U/yD8ZBAQr9xxe0XsL05baRX4GbVS2cUkwnjA7tl8mpwFqIIuHc lFimABPLkPpQ5pQTKRv8zw==; Received: from [87.69.77.57] (port=2279 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAj9O-0007wO-0g; Sun, 10 Jul 2022 22:27:54 -0400 Date: Mon, 11 Jul 2022 05:27:39 +0300 Message-Id: <83zghgz8mc.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: (message from Stefan Monnier on Sun, 10 Jul 2022 18:13:39 -0400) References: <83y1x2177x.fsf@gnu.org> <83bktx11ji.fsf@gnu.org> <83a69h0zwz.fsf@gnu.org> <835yk50ybx.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Stefan Monnier > Cc: 56469@debbugs.gnu.org > Date: Sun, 10 Jul 2022 18:13:39 -0400 > > > I don't think file-name related stuff can work in Emacs when > > file-name-coding-system is set to an arbitrary value not reflecting > > the reality. > > I'd tend to agree (tho 'binary' does sound like a valid value which > should work in all cases under GNU/Linux). 'binary' exactly means that you end up with unibyte strings and with raw bytes in multibyte strings. > I'm just a bit annoyed at the idea that ELisp code can end up > constructing a multibyte string whose bytes contain invalid utf-8 > sequences, because I suspect we may end up with a core dump somewhere in > such a circumstance. Emacs should cope with this without dumping core, but the resulting file names might not be readable by humans nor friendly to other programs. From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 05 15:20:38 2022 Received: (at control) by debbugs.gnu.org; 5 Sep 2022 19:20:38 +0000 Received: from localhost ([127.0.0.1]:48921 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oVHe9-0003xw-Uu for submit@debbugs.gnu.org; Mon, 05 Sep 2022 15:20:38 -0400 Received: from quimby.gnus.org ([95.216.78.240]:43262) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oVHe8-0003xj-PE for control@debbugs.gnu.org; Mon, 05 Sep 2022 15:20:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Subject:From:To:Message-Id:Date:Sender:Reply-To:Cc: MIME-Version:Content-Type:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=lQoOmb7ZOMdTmNKiTY+t3NhXQNnJVIf0qBDRBFPiQ2k=; b=g+z3MiOg8iEUufugZi/BAyszPF vC/CIDsCD5FsoWIQeoDsf+Q6hUnunwIFUkDwUc0qtEKwA/z9sXXOzi6oHRSYefbUxxYzbT+TUEkwm VQ0Gyv3BiKhxeI7JnKOAM3Ik8FYmaZJRWr6HEIAeqk0oB256dTnWiPH7M4L13xGWBTY8=; Received: from [84.212.220.105] (helo=joga) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oVHe0-0004ap-MR for control@debbugs.gnu.org; Mon, 05 Sep 2022 21:20:30 +0200 Date: Mon, 05 Sep 2022 21:20:28 +0200 Message-Id: <87sfl5zkzn.fsf@gnus.org> To: control@debbugs.gnu.org From: Lars Ingebrigtsen Subject: control message for bug #56469 X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: tags 56469 - patch quit Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) tags 56469 - patch quit From unknown Sat Jun 21 12:13:22 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56469: 29.0.50; Unibyte dir in directory_files_internal Resent-From: Lars Ingebrigtsen Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 05 Sep 2022 19:22:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56469 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii Cc: Stefan Monnier , 56469@debbugs.gnu.org Received: via spool by 56469-submit@debbugs.gnu.org id=B56469.166240570815410 (code B ref 56469); Mon, 05 Sep 2022 19:22:02 +0000 Received: (at 56469) by debbugs.gnu.org; 5 Sep 2022 19:21:48 +0000 Received: from localhost ([127.0.0.1]:48929 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oVHfI-00040U-Iq for submit@debbugs.gnu.org; Mon, 05 Sep 2022 15:21:48 -0400 Received: from quimby.gnus.org ([95.216.78.240]:43288) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oVHfH-00040H-5M for 56469@debbugs.gnu.org; Mon, 05 Sep 2022 15:21:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:Date:References: In-Reply-To:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=nYbMoyfll8ZzeI5ijfEfpMUFvX/fI2nu60do+qtiIn4=; b=PIc9eAY7srCbK3383FyRdLtES6 P/G+Hza1q0pFpYSDL0dk9tu5z2gKzehxinnSbcCQPyGLYnYiO5M6gn8H6SxLd66SK4sUPs2ttWOZc PEvA7mTVnz2dcVVQt9XlBNXokfQPtuRzqoALqirDi6LXvwbPpt1hg3fWRbsbvwyVLT54=; Received: from [84.212.220.105] (helo=joga) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oVHf7-0004bA-RS; Mon, 05 Sep 2022 21:21:39 +0200 From: Lars Ingebrigtsen In-Reply-To: <835yk50ybx.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 10 Jul 2022 18:41:38 +0300") References: <83y1x2177x.fsf@gnu.org> <83bktx11ji.fsf@gnu.org> <83a69h0zwz.fsf@gnu.org> <835yk50ybx.fsf@gnu.org> X-Now-Playing: Joni Mitchell's _Hejira_: "Coyote" Date: Mon, 05 Sep 2022 21:21:36 +0200 Message-ID: <87o7vtzkxr.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Eli Zaretskii writes: > (But I don't object to the comment.) Skimming this bug report lightly, it seems like the proposed patch was applied, but then the discussion continued. It's not clear to me whether there's more to be done here -- should this report be cl [...] Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Eli Zaretskii writes: > (But I don't object to the comment.) Skimming this bug report lightly, it seems like the proposed patch was applied, but then the discussion continued. It's not clear to me whether there's more to be done here -- should this report be closed? From unknown Sat Jun 21 12:13:22 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Stefan Monnier Subject: bug#56469: closed (Re: bug#56469: 29.0.50; Unibyte dir in directory_files_internal) Message-ID: References: <83v8pzgvj2.fsf@gnu.org> X-Gnu-PR-Message: they-closed 56469 X-Gnu-PR-Package: emacs Reply-To: 56469@debbugs.gnu.org Date: Wed, 07 Sep 2022 13:33:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1662557582-31209-1" This is a multi-part message in MIME format... ------------=_1662557582-31209-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #56469: 29.0.50; Unibyte dir in directory_files_internal which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 56469@debbugs.gnu.org. --=20 56469: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D56469 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1662557582-31209-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 56469-done) by debbugs.gnu.org; 7 Sep 2022 13:32:43 +0000 Received: from localhost ([127.0.0.1]:54080 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oVvAZ-00086s-6C for submit@debbugs.gnu.org; Wed, 07 Sep 2022 09:32:43 -0400 Received: from eggs.gnu.org ([209.51.188.92]:43210) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oVvAW-00086f-S0 for 56469-done@debbugs.gnu.org; Wed, 07 Sep 2022 09:32:42 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:53644) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oVvAR-0006HG-0o; Wed, 07 Sep 2022 09:32:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=LXj23hk1utQ9Xg3V07PslmScFd4DQ2VjPhq3Ovf1lBU=; b=XwkBtzkZeewS WT5UJIIBcN4iqxwBUuxWR7dYN9StlVIKU9bmZklZ54yMcanC7GZuqtPplVk1ONikBgNtjzbSmuD+J broJH+tMghY1heT0qLnlzoepzXUY8tIz5TRz+0DMwd39LYrl5lVaAxNQ4pAGu6b38YzVyI+uXdmR2 5PpbYEbEMrm3xR3jdBj8EEqCSxWdHZ8WnZTHdnWm0KEe5oXjbUnFRJ0ORJCgJoTk5bmhWqWo0Qhsp /GIeVwij/pPZ8Mouk3ISr+cQH5sILN04olQicciROhBDg8dLJvajf3I2G2fuLWuqlhZ2W+vHZB/3x g192k2py9SV/hO7zBwqWLg==; Received: from [87.69.77.57] (port=3371 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oVvAP-0000mk-5s; Wed, 07 Sep 2022 09:32:34 -0400 Date: Wed, 07 Sep 2022 16:32:17 +0300 Message-Id: <83v8pzgvj2.fsf@gnu.org> From: Eli Zaretskii To: Lars Ingebrigtsen In-Reply-To: <87o7vtzkxr.fsf@gnus.org> (message from Lars Ingebrigtsen on Mon, 05 Sep 2022 21:21:36 +0200) Subject: Re: bug#56469: 29.0.50; Unibyte dir in directory_files_internal References: <83y1x2177x.fsf@gnu.org> <83bktx11ji.fsf@gnu.org> <83a69h0zwz.fsf@gnu.org> <835yk50ybx.fsf@gnu.org> <87o7vtzkxr.fsf@gnus.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 56469-done Cc: 56469-done@debbugs.gnu.org, monnier@iro.umontreal.ca X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Lars Ingebrigtsen > Cc: Stefan Monnier , 56469@debbugs.gnu.org > Date: Mon, 05 Sep 2022 21:21:36 +0200 > > Skimming this bug report lightly, it seems like the proposed patch was > applied, but then the discussion continued. It's not clear to me > whether there's more to be done here -- should this report be closed? Yes, I think so. Done. ------------=_1662557582-31209-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 9 Jul 2022 17:45:21 +0000 Received: from localhost ([127.0.0.1]:35860 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAEW8-0006Pl-NS for submit@debbugs.gnu.org; Sat, 09 Jul 2022 13:45:21 -0400 Received: from lists.gnu.org ([209.51.188.17]:59458) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oAEW6-0006Pd-En for submit@debbugs.gnu.org; Sat, 09 Jul 2022 13:45:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60654) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAEW3-0006MA-QD for bug-gnu-emacs@gnu.org; Sat, 09 Jul 2022 13:45:18 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:45963) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oAEW0-0005bk-UP for bug-gnu-emacs@gnu.org; Sat, 09 Jul 2022 13:45:14 -0400 Received: from pmg1.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id 42D05100182 for ; Sat, 9 Jul 2022 13:45:10 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id 752B410012B for ; Sat, 9 Jul 2022 13:45:04 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1657388704; bh=ADZsJdqYhkryUf4oGzhRT0jk+3ogCDsQVs6wL90dkN0=; h=From:To:Subject:Date:From; b=K5Ot9bninuciVARa3mhu/gDmmvVaKKLX6lb7XWcnty0R6MpVY6vnBleKWThDx3Isb vsFY4ZcMOPiU1HiBuptZSgLPiNc4ciwq/2A7op1W+zfzfCJbHOG1no02rxXzPyGnOR 0mw+ZepCDstuzoXCTbM3jPLDKhWX9wWNBONyeiqB/iBBfqFiuKMvhY2xThxm2H3qOM KtcZhOITv9o4XO1IrSE2TK2JMiwCRDr2pf5ZzUTHMFsZAsLA0hNJd9BFgm4eWBBQOQ LWT1Ag8H/cS5+SrFFiJyTDk6dYA4iZUGCt3JHfqvm0UgcWmVIo0b6ULI9aPkcLmhm+ LcdTKLV+WSDKQ== Received: from pastel (unknown [45.72.196.165]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 1DF7F1204F3 for ; Sat, 9 Jul 2022 13:45:04 -0400 (EDT) From: Stefan Monnier To: bug-gnu-emacs@gnu.org Subject: 29.0.50; Unibyte dir in directory_files_internal Date: Sat, 09 Jul 2022 13:44:52 -0400 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL -0.045 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Package: Emacs Version: 29.0.50 If you have a directory named "/tmp/\303a" with a file named "f=E9e" inside, then (directory-files "/tmp/\303a" 'full) is likely to return a funny string which is multibyte but contains an invalid utf-8 sequence (its bytes spell "/tmp/\303a/f\303\251e"). That strings seems to be printed as "/tmp/=A1/f=E9e" which corresponds to "/tmp/\303\241/f\303\251e". Such a string with an invalid UTF-8 sequence is handled quite graciously by Emacs, so I wasn't able to get an actual crash out of it, but it's still something we should avoid. I suggest the patch below. In a comment I suggest we don't try to use unibyte strings when a multibyte string would work as well. This is because for those ASCII-only strings, it's cheaper to test bytes=3D=3Dchars to (re)discover that they are ASCII-only (when they're multibyte) than having to loop through the bytes (when they're unibyte). Stefan diff --git a/src/dired.c b/src/dired.c index 6bb8c2fcb9f..33ddfafd8e7 100644 --- a/src/dired.c +++ b/src/dired.c @@ -219,6 +219,13 @@ directory_files_internal (Lisp_Object directory, Lisp_= Object full, } #endif =20 + if (!NILP (full) && !STRING_MULTIBYTE (directory)) + { /* We will be concatenating 'directory' with local file name. + We always decode local file names, so in order to safely concaten= ate + them we need 'directory' to be multibyte. */ + directory =3D Fstring_to_multibyte (directory); + } + ptrdiff_t directory_nbytes =3D SBYTES (directory); re_match_object =3D Qt; =20 @@ -263,9 +270,10 @@ directory_files_internal (Lisp_Object directory, Lisp_= Object full, ptrdiff_t name_nbytes =3D SBYTES (name); ptrdiff_t nbytes =3D directory_nbytes + needsep + name_nbytes; ptrdiff_t nchars =3D SCHARS (directory) + needsep + SCHARS (name); - finalname =3D make_uninit_multibyte_string (nchars, nbytes); - if (nchars =3D=3D nbytes) - STRING_SET_UNIBYTE (finalname); + /* FIXME: Why not make them all multibyte? */ + finalname =3D (nchars =3D=3D nbytes) + ? make_uninit_string (nchars, nbytes) + : make_uninit_multibyte_string (nchars, nbytes); memcpy (SDATA (finalname), SDATA (directory), directory_nbytes); if (needsep) SSET (finalname, directory_nbytes, DIRECTORY_SEP); ------------=_1662557582-31209-1--