From unknown Wed Jun 18 00:25:23 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#27978 <27978@debbugs.gnu.org> To: bug#27978 <27978@debbugs.gnu.org> Subject: Status: Detection of section name in man.el Reply-To: bug#27978 <27978@debbugs.gnu.org> Date: Wed, 18 Jun 2025 07:25:23 +0000 retitle 27978 Detection of section name in man.el reassign 27978 emacs submitter 27978 Gr=C3=A9gory Mouni=C3=A9 severity 27978 minor thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 05 19:57:16 2017 Received: (at submit) by debbugs.gnu.org; 5 Aug 2017 23:57:17 +0000 Received: from localhost ([127.0.0.1]:43871 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1de8wW-0000Gq-Cy for submit@debbugs.gnu.org; Sat, 05 Aug 2017 19:57:16 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45913) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1de8kI-0008QZ-Mg for submit@debbugs.gnu.org; Sat, 05 Aug 2017 19:44:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1de8kC-0001wB-BA for submit@debbugs.gnu.org; Sat, 05 Aug 2017 19:44:33 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42847) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1de8kC-0001w7-7S for submit@debbugs.gnu.org; Sat, 05 Aug 2017 19:44:32 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36882) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1de8kA-0007Pn-RK for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:44:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1de8k4-0001rs-Hb for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:44:30 -0400 Received: from zm-mta-out-1.u-ga.fr ([152.77.200.56]:35819) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1de8k4-0001qj-8w for bug-gnu-emacs@gnu.org; Sat, 05 Aug 2017 19:44:24 -0400 Received: from zm-mta-out.u-ga.fr (zm-mta-out.u-ga.fr [152.77.200.58]) by zm-mta-out-1.u-ga.fr (Postfix) with ESMTP id DC3EFA02D0 for ; Sun, 6 Aug 2017 01:44:20 +0200 (CEST) Received: from smtps.univ-grenoble-alpes.fr (smtps.univ-grenoble-alpes.fr [152.77.1.30]) by zm-mta-out.u-ga.fr (Postfix) with ESMTP id EA375E0093 for ; Sun, 6 Aug 2017 01:44:20 +0200 (CEST) Received: from [192.168.1.13] (mut38-1-82-67-65-81.fbx.proxad.net [82.67.65.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: mounieg@univ-grenoble-alpes.fr) by smtps.univ-grenoble-alpes.fr (Postfix) with ESMTPSA id A4611125EB4 for ; Sun, 6 Aug 2017 01:44:20 +0200 (CEST) To: bug-gnu-emacs@gnu.org From: =?UTF-8?Q?Gr=c3=a9gory_Mouni=c3=a9?= Subject: Detection of section name in man.el Message-ID: <490651f5-e3f7-fd6d-e008-5c52d78fa675@imag.fr> Date: Sun, 6 Aug 2017 01:44:19 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------D3D8D93421EA207A219E3ED7" Content-Language: en-US X-Greylist: Whitelist-UJF SMTP Authentifie (mounieg@univ-grenoble-alpes.fr) via submission-587 ACL (111) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.4 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 05 Aug 2017 19:57:14 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) This is a multi-part message in MIME format. --------------D3D8D93421EA207A219E3ED7 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable When parsing manual in languages with non-ascii letters, the section=20 names using non-ascii letters are not added to the table of content. I noticed the bug reading the French bash manual: the quite useful=20 "COMMANDES INTERNES DE l'INTERPR=C3=89TEUR" section does not appear (SHEL= L=20 BUILTIN COMMAND). (because of the =C3=89 letter) I propose to use Character class instead of ascii interval in the=20 appropriate regexp defvar. It should not change anything for english=20 manual and it should work for many other languages. It works great for the bash manual in French. Gr=C3=A9gory Mouni=C3=A9 --------------D3D8D93421EA207A219E3ED7 Content-Type: text/x-patch; name="0001-Unicode-support-for-man-section-name-detection.patch" Content-Disposition: attachment; filename*0="0001-Unicode-support-for-man-section-name-detection.patch" Content-Transfer-Encoding: quoted-printable >From f9f8b027bcec6fe7aec2c0009eecdcd7e8880292 Mon Sep 17 00:00:00 2001 From: =3D?UTF-8?q?Gr=3DC3=3DA9gory=3D20Mouni=3DC3=3DA9?=3D Date: Sun, 6 Aug 2017 01:22:58 +0200 Subject: [PATCH] Unicode support for man section name detection * lisp/man.el: Replace ascii interval by character class in order to detect correctly the section names in the table of content (eg. in the french version of the bash manual). Copyright-paperwork-exempt: yes --- lisp/man.el | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lisp/man.el b/lisp/man.el index 0e1c92956b..97a4758e7e 100644 --- a/lisp/man.el +++ b/lisp/man.el @@ -278,21 +278,21 @@ Man-cooked-hook :type 'hook :group 'man) =20 -(defvar Man-name-regexp "[-a-zA-Z0-9_=C2=AD+][-a-zA-Z0-9_.:=C2=AD+]*" +(defvar Man-name-regexp "[-[:alnum:]_=C2=AD+][-[:alnum:]_.:=C2=AD+]*" "Regular expression describing the name of a manpage (without section)= .") =20 -(defvar Man-section-regexp "[0-9][a-zA-Z0-9+]*\\|[LNln]" +(defvar Man-section-regexp "[[:digit:]][[:alnum:]+]*\\|[LNln]" "Regular expression describing a manpage section within parentheses.") =20 (defvar Man-page-header-regexp (if (string-match "-solaris2\\." system-configuration) - (concat "^[-A-Za-z0-9_].*[ \t]\\(" Man-name-regexp + (concat "^[-[:alnum:]_].*[ \t]\\(" Man-name-regexp "(\\(" Man-section-regexp "\\))\\)$") (concat "^[ \t]*\\(" Man-name-regexp "(\\(" Man-section-regexp "\\))\\).*\\1")) "Regular expression describing the heading of a page.") =20 -(defvar Man-heading-regexp "^\\([A-Z][A-Z0-9 /-]+\\)$" +(defvar Man-heading-regexp "^\\([[:upper:]][[:upper:][:digit:] /-]+\\)$" "Regular expression describing a manpage heading entry.") =20 (defvar Man-see-also-regexp "SEE ALSO" --=20 2.13.3 --------------D3D8D93421EA207A219E3ED7-- From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 18 04:50:22 2017 Received: (at 27978-done) by debbugs.gnu.org; 18 Aug 2017 08:50:22 +0000 Received: from localhost ([127.0.0.1]:43498 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dicyy-0003nV-MD for submit@debbugs.gnu.org; Fri, 18 Aug 2017 04:50:21 -0400 Received: from eggs.gnu.org ([208.118.235.92]:54960) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dicyw-0003nH-PW for 27978-done@debbugs.gnu.org; Fri, 18 Aug 2017 04:50:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dicyo-0002cY-9I for 27978-done@debbugs.gnu.org; Fri, 18 Aug 2017 04:50:13 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:56103) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dicyo-0002cN-5l; Fri, 18 Aug 2017 04:50:10 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4341 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dicyn-0006L7-Ix; Fri, 18 Aug 2017 04:50:10 -0400 Date: Fri, 18 Aug 2017 11:49:57 +0300 Message-Id: <83r2w9dzuy.fsf@gnu.org> From: Eli Zaretskii To: =?utf-8?Q?Gr=C3=A9gory_Mouni=C3=A9?= In-reply-to: <490651f5-e3f7-fd6d-e008-5c52d78fa675@imag.fr> (Gregory.Mounie@imag.fr) Subject: Re: bug#27978: Detection of section name in man.el References: <490651f5-e3f7-fd6d-e008-5c52d78fa675@imag.fr> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 27978-done Cc: 27978-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) > From: Grégory Mounié > > Date: Sun, 6 Aug 2017 01:44:19 +0200 > > When parsing manual in languages with non-ascii letters, the section > names using non-ascii letters are not added to the table of content. > > I noticed the bug reading the French bash manual: the quite useful > "COMMANDES INTERNES DE l'INTERPRÉTEUR" section does not appear (SHELL > BUILTIN COMMAND). (because of the É letter) > > I propose to use Character class instead of ascii interval in the > appropriate regexp defvar. It should not change anything for english > manual and it should work for many other languages. Thanks, I pushed these changes with some minor adjustments. Specifically: > -(defvar Man-section-regexp "[0-9][a-zA-Z0-9+]*\\|[LNln]" > +(defvar Man-section-regexp "[[:digit:]][[:alnum:]+]*\\|[LNln]" > "Regular expression describing a manpage section within parentheses.") I didn't change this one, because I think a section always uses only ASCII letters and numbers, as in ".1n". If you disagree, can you show an example where this is not so? > -(defvar Man-heading-regexp "^\\([A-Z][A-Z0-9 /-]+\\)$" > +(defvar Man-heading-regexp "^\\([[:upper:]][[:upper:][:digit:] /-]+\\)$" > "Regular expression describing a manpage heading entry.") I see no reason to replace 0-9 with [:digit:] here, since I think non-ASCII digits will never be used in this context. Do you agree? Incidentally, I see quite a few similar regexps elsewhere in man.el, did you audit all of them and established that they don't need similar changes? If not, would you like to propose similar changes there? From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 18 15:23:33 2017 Received: (at 27978) by debbugs.gnu.org; 18 Aug 2017 19:23:33 +0000 Received: from localhost ([127.0.0.1]:44615 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dimrk-0003kS-S3 for submit@debbugs.gnu.org; Fri, 18 Aug 2017 15:23:33 -0400 Received: from eggs.gnu.org ([208.118.235.92]:47926) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dimrj-0003kH-Hj for 27978@debbugs.gnu.org; Fri, 18 Aug 2017 15:23:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dimrb-0006fX-Bn for 27978@debbugs.gnu.org; Fri, 18 Aug 2017 15:23:26 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:47356) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dimrb-0006fN-9d; Fri, 18 Aug 2017 15:23:23 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1223 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dimra-0003BQ-II; Fri, 18 Aug 2017 15:23:23 -0400 Date: Fri, 18 Aug 2017 22:23:10 +0300 Message-Id: <83h8x4el41.fsf@gnu.org> From: Eli Zaretskii To: =?utf-8?Q?Gr=C3=A9gory_Mouni=C3=A9?= In-reply-to: <4f29a934-24db-6d10-db27-fd3a3a0c1269@imag.fr> (message from =?utf-8?Q?Gr=C3=A9gory_Mouni=C3=A9?= on Fri, 18 Aug 2017 19:53:44 +0200) Subject: Re: bug#27978: Detection of section name in man.el References: <490651f5-e3f7-fd6d-e008-5c52d78fa675@imag.fr> <83r2w9dzuy.fsf@gnu.org> <4f29a934-24db-6d10-db27-fd3a3a0c1269@imag.fr> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 27978 Cc: 27978@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) [Please keep the bug address on the CC list.] > From: Grégory Mounié > Date: Fri, 18 Aug 2017 19:53:44 +0200 > > In brief, I would not change the other a-zA-Z regexps (details below). > > But I would change the SEE ALSO regexp (around line 298) to add other > languages. Should I fill another bug report with another patch ? > > (defvar Man-see-also-regexp "SEE ALSO" > "Regular expression for SEE ALSO heading (or your equivalent). > This regexp should not start with a `^' character.") > > using the debian manpages translation as référence, and using > "zgrep -h SH man*/* | sort | uniq -c | sort -n" inside appropriate > /usr/share/man subdirectories to infer the values, I propose: > > "SEE ALSO\|VOIR AUSSI\|SIEHE AUCH\|VÉASE TAMBIÉN\|VEJA TAMBÉM\|VEDERE > ANCHE\|ZOBACZ TAKŻE\|İLGİLİ BELGELER\|参照|参见 SEE ALSO\|參見 SEE ALSO" > > (French, German, Spanish, Portugese, Italian, Polish, Turkish, > Japanese, Chinese CN, Chinese TW) OK. If no one objects, I will make this change soon. Thanks. > Details below about the a-zA-Z regexps: > > Le 18/08/2017 à 10:49, Eli Zaretskii a écrit : > > > > Thanks, I pushed these changes with some minor adjustments. > > Specifically: > > > >> -(defvar Man-section-regexp "[0-9][a-zA-Z0-9+]*\\|[LNln]" > >> +(defvar Man-section-regexp "[[:digit:]][[:alnum:]+]*\\|[LNln]" > >> "Regular expression describing a manpage section within parentheses.") > > > > I didn't change this one, because I think a section always uses only > > ASCII letters and numbers, as in ".1n". If you disagree, can you show > > an example where this is not so? > > > > I have install the various multilingual standard manpages of my debian > and I have not grep a counter example so I guess it is perfect. > > >> -(defvar Man-heading-regexp "^\\([A-Z][A-Z0-9 /-]+\\)$" > >> +(defvar Man-heading-regexp "^\\([[:upper:]][[:upper:][:digit:] /-]+\\)$" > >> "Regular expression describing a manpage heading entry.") > > > > I see no reason to replace 0-9 with [:digit:] here, since I think > > non-ASCII digits will never be used in this context. Do you agree? > > > > Incidentally, I see quite a few similar regexps elsewhere in man.el, > > did you audit all of them and established that they don't need similar > > changes? If not, would you like to propose similar changes there? > > > > There are 18 a-Z. They seem like a detection carefully crafted by > history, thus I would not change them without counter-example either. > > The first four a-zA-Z seems related to the parsing of external > command, with particularities in Windows port so I would not recommend > to change it. > The 5-18 a-zA-Z try to guess the manpage around POS. The main pattern > is "-a-zA-Z0-9._+:" > > With the same set of multi-lingual manpages, I have found only one > character used in manpage name and not in the set: "[" (man [ leads you > to test). I suspect that adding "[" would add more regressions than > solutions. > > Note that line 720 the pattern is slightly different (missing "-._:"). > I do not understand really why. From unknown Wed Jun 18 00:25:23 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 16 Sep 2017 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator