GNU bug report logs - #77387
[PATCH 0/2] man-db: Better parsing of man macros.

Previous Next

Package: guix-patches;

Reported by: Sergey Trofimov <sarg <at> sarg.org.ru>

Date: Sun, 30 Mar 2025 14:27:02 UTC

Severity: normal

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 77387 in the body.
You can then email your comments to 77387 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Sun, 30 Mar 2025 14:27:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Sergey Trofimov <sarg <at> sarg.org.ru>:
New bug report received and forwarded. Copy sent to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org. (Sun, 30 Mar 2025 14:27:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: guix-patches <at> gnu.org
Cc: Sergey Trofimov <sarg <at> sarg.org.ru>
Subject: [PATCH 0/2] man-db: Better parsing of man macros.
Date: Sun, 30 Mar 2025 16:25:46 +0200
Hey guix, I've noticed that quite a lot man pages are reported to belong to a wrong section by `man -k`:

--8<---------------cut here---------------start------------->8---
$ man -k "" | grep "(0)"
...
ssh-pkcs11-helper (0) - (unknown subject)
ssh-sk-helper (0)    - (unknown subject)
ssh_config (0)       - (unknown subject)
sshd (0)             - (unknown subject)
sshd_config (0)      - (unknown subject)
sudo (0)             - (unknown subject)
sudo.conf (0)        - (unknown subject)
tc-cgroup (0)        - control group based traffic control filter
tc-connmark (0)      - (unknown subject)
...
--8<---------------cut here---------------end--------------->8---

A side-effect of it is that `M-x man` doesn't list such pages in auto-completion. I've attempted to fix that, see the following patch.

With the patch `man -k` and `M-x man` work properly:

--8<---------------cut here---------------start------------->8---
$ man -k sudo
cvtsudoers (1)       - (unknown subject)
sudo (8)             - (unknown subject)
sudo.conf (5)        - (unknown subject)
sudo_logsrv.proto (5) - (unknown subject)
sudo_logsrvd (8)     - (unknown subject)
...
--8<---------------cut here---------------end--------------->8---


Note, that synopsis extraction also needs improvement, however it turns out to
be more complicated as proper formatting requires cleaning up / expanding macros.

Sergey Trofimov (2):
  man-db: Parse man macro arguments better.
  man-db: Support mdoc-formatted man pages.

 guix/man-db.scm | 52 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 9 deletions(-)


base-commit: 2ed28b5c24c599b2f9bc60dfc93151cf489ca477
-- 
2.49.0





Information forwarded to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Sun, 30 Mar 2025 15:51:01 GMT) Full text and rfc822 format available.

Message #8 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: 77387 <at> debbugs.gnu.org
Cc: Sergey Trofimov <sarg <at> sarg.org.ru>
Subject: [PATCH 1/2] man-db: Parse man macro arguments better.
Date: Sun, 30 Mar 2025 16:32:54 +0200
* guix/man-db.scm (man-macro-tokenize): New procedure to parse man
macros.
(man-page->entry): Parse macro line using man-macro-tokenize.

Change-Id: Iea0ffbc65290757df746138e0a6174646b5a3eb8
---
 guix/man-db.scm | 52 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 9 deletions(-)

diff --git a/guix/man-db.scm b/guix/man-db.scm
index bba90ed473..44c01ac298 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -161,16 +161,50 @@ (define (read-synopsis port)
       (line
        (loop (cons line lines))))))
 
+(define (man-macro-tokenize input)
+  (let loop ((pos 0)
+             (tokens '())
+             (current '())
+             (in-string? #f))
+    (if (>= pos (string-length input))
+        ;; End of input
+        (unless in-string?
+          (reverse (if (null? current)
+                       tokens
+                       (cons (list->string (reverse current)) tokens))))
+        (let ((c (string-ref input pos)))
+          (cond
+           ;; Inside a string
+           (in-string?
+            (if (char=? c #\")
+                (if (and (< (+ pos 1) (string-length input))
+                         (char=? (string-ref input (+ pos 1)) #\"))
+                    ;; Double quote inside string
+                    (loop (+ pos 2) tokens (cons #\" current) #t)
+                    ;; End of string
+                    (loop (+ pos 1) (cons (list->string (reverse current)) tokens) '() #f))
+                ;; Regular character in string
+                (loop (+ pos 1) tokens (cons c current) #t)))
+
+           ;; Whitespace outside string
+           ((char-whitespace? c)
+            (if (null? current)
+                (loop (+ pos 1) tokens '() #f)
+                (loop (+ pos 1) (cons (list->string (reverse current)) tokens) '() #f)))
+
+           ;; Start of string
+           ((char=? c #\")
+            (if (null? current)
+                (loop (+ pos 1) tokens '() #t)
+                (loop pos (cons (list->string (reverse current)) tokens) '() #f)))
+
+           ;; Symbol character
+           (else
+            (loop (+ pos 1) tokens (cons c current) #f)))))))
+
 (define* (man-page->entry file #:optional (resolve identity))
   "Parse FILE, a gzip or zstd compressed man page, and return a <mandb-entry>
 for it."
-  (define (string->number* str)
-    (if (and (string-prefix? "\"" str)
-             (> (string-length str) 1)
-             (string-suffix? "\"" str))
-        (string->number (string-drop (string-drop-right str 1) 1))
-        (string->number str)))
-
   (define call-with-input-port*
     (cond
      ((gzip-compressed? file) call-with-gzip-input-port)
@@ -189,8 +223,8 @@ (define* (man-page->entry file #:optional (resolve identity))
               (if (eof-object? line)
                   (mandb-entry file name (or section 0) (or synopsis "")
                                kind)
-                  (match (string-tokenize line)
-                    ((".TH" name (= string->number* section) _ ...)
+                  (match (if (string-prefix? "." line) (man-macro-tokenize line) #f)
+                    ((".TH" name (= string->number section) _ ...)
                      (loop name section synopsis kind))
                     ((".SH" (or "NAME" "\"NAME\""))
                      (loop name section (read-synopsis port) kind))
-- 
2.49.0





Information forwarded to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Sun, 30 Mar 2025 15:51:02 GMT) Full text and rfc822 format available.

Message #11 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: 77387 <at> debbugs.gnu.org
Cc: Sergey Trofimov <sarg <at> sarg.org.ru>
Subject: [PATCH 2/2] man-db: Support mdoc-formatted man pages.
Date: Sun, 30 Mar 2025 16:32:55 +0200
* guix/man-db.scm (man-page->entry): Extract man name and section from
.Dt macro.

Change-Id: I02dc99d73dceecdb077315805025efad9a650e91
---
 guix/man-db.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/guix/man-db.scm b/guix/man-db.scm
index 44c01ac298..44668a3ebf 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -224,7 +224,7 @@ (define* (man-page->entry file #:optional (resolve identity))
                   (mandb-entry file name (or section 0) (or synopsis "")
                                kind)
                   (match (if (string-prefix? "." line) (man-macro-tokenize line) #f)
-                    ((".TH" name (= string->number section) _ ...)
+                    (((or ".TH" ".Dt") name (= string->number section) _ ...)
                      (loop name section synopsis kind))
                     ((".SH" (or "NAME" "\"NAME\""))
                      (loop name section (read-synopsis port) kind))
-- 
2.49.0





Information forwarded to guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Tue, 01 Apr 2025 12:09:04 GMT) Full text and rfc822 format available.

Message #14 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Sergey Trofimov <sarg <at> sarg.org.ru>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, 77387 <at> debbugs.gnu.org,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Christopher Baines <guix <at> cbaines.net>
Subject: Re: [bug#77387] [PATCH 1/2] man-db: Parse man macro arguments better.
Date: Tue, 01 Apr 2025 14:07:35 +0200
Hi!

Glad you fixed this problem. :-)

Sergey Trofimov <sarg <at> sarg.org.ru> skribis:

> * guix/man-db.scm (man-macro-tokenize): New procedure to parse man
> macros.
> (man-page->entry): Parse macro line using man-macro-tokenize.
>
> Change-Id: Iea0ffbc65290757df746138e0a6174646b5a3eb8

[...]

> +(define (man-macro-tokenize input)

Could you add a docstring explaining what it takes and what it returns?

> +  (let loop ((pos 0)
> +             (tokens '())
> +             (current '())

Maybe s/current/characters/ ?

> +             (in-string? #f))
> +    (if (>= pos (string-length input))
> +        ;; End of input
> +        (unless in-string?
> +          (reverse (if (null? current)
> +                       tokens
> +                       (cons (list->string (reverse current)) tokens))))

So this procedure can return *unspecified*, right?  Sounds fishy.

> @@ -189,8 +223,8 @@ (define* (man-page->entry file #:optional (resolve identity))
>                (if (eof-object? line)
>                    (mandb-entry file name (or section 0) (or synopsis "")
>                                 kind)
> -                  (match (string-tokenize line)
> -                    ((".TH" name (= string->number* section) _ ...)
> +                  (match (if (string-prefix? "." line) (man-macro-tokenize line) #f)
> +                    ((".TH" name (= string->number section) _ ...)

Please add a comment above ‘match’ explaining what’s happening (why we
call ‘man-macro-tokenize’ etc.).

Also: (and (string-prefix? "." line) (man-macro-tokenize line))

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Tue, 01 Apr 2025 12:09:05 GMT) Full text and rfc822 format available.

Message #17 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Sergey Trofimov <sarg <at> sarg.org.ru>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, 77387 <at> debbugs.gnu.org,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Christopher Baines <guix <at> cbaines.net>
Subject: Re: [bug#77387] [PATCH 2/2] man-db: Support mdoc-formatted man pages.
Date: Tue, 01 Apr 2025 14:08:31 +0200
Sergey Trofimov <sarg <at> sarg.org.ru> skribis:

> * guix/man-db.scm (man-page->entry): Extract man name and section from
> .Dt macro.
>
> Change-Id: I02dc99d73dceecdb077315805025efad9a650e91

[...]

>                    (match (if (string-prefix? "." line) (man-macro-tokenize line) #f)
> -                    ((".TH" name (= string->number section) _ ...)
> +                    (((or ".TH" ".Dt") name (= string->number section) _ ...)

Likewise, please add a short comment above the clause explaining that
‘.Dt’ is produced by ‘mandoc’ (did I get that right?).

Ludo’.




Information forwarded to sarg <at> sarg.org.ru, ludo <at> gnu.org, guix <at> cbaines.net, dev <at> jpoiret.xyz, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Tue, 01 Apr 2025 19:33:02 GMT) Full text and rfc822 format available.

Message #20 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: 77387 <at> debbugs.gnu.org
Cc: Sergey Trofimov <sarg <at> sarg.org.ru>
Subject: [PATCH v1 1/2] man-db: Parse man macro arguments better.
Date: Tue,  1 Apr 2025 21:31:59 +0200
* guix/man-db.scm (man-macro-tokenize): New procedure to parse man
macros.
(man-page->entry): Parse macro line using man-macro-tokenize.

Change-Id: Iea0ffbc65290757df746138e0a6174646b5a3eb8
---
 guix/man-db.scm | 56 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 47 insertions(+), 9 deletions(-)

diff --git a/guix/man-db.scm b/guix/man-db.scm
index bba90ed473..94231264f0 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -161,16 +161,52 @@ (define (read-synopsis port)
       (line
        (loop (cons line lines))))))
 
+(define (man-macro-tokenize input)
+  "Split INPUT string, a man macro invocation, into a list containing the macro's
+name followed by its arguments."
+  (let loop ((pos 0)
+             (tokens '())
+             (characters '())
+             (in-string? #f))
+    (if (>= pos (string-length input))
+        ;; End of input
+        (unless in-string?
+          (reverse (if (null? characters)
+                       tokens
+                       (cons (list->string (reverse characters)) tokens))))
+        (let ((c (string-ref input pos)))
+          (cond
+           ;; Inside a string
+           (in-string?
+            (if (char=? c #\")
+                (if (and (< (+ pos 1) (string-length input))
+                         (char=? (string-ref input (+ pos 1)) #\"))
+                    ;; Double quote inside string
+                    (loop (+ pos 2) tokens (cons #\" characters) #t)
+                    ;; End of string
+                    (loop (+ pos 1) (cons (list->string (reverse characters)) tokens) '() #f))
+                ;; Regular character in string
+                (loop (+ pos 1) tokens (cons c characters) #t)))
+
+           ;; Whitespace outside string
+           ((char-whitespace? c)
+            (if (null? characters)
+                (loop (+ pos 1) tokens '() #f)
+                (loop (+ pos 1) (cons (list->string (reverse characters)) tokens) '() #f)))
+
+           ;; Start of string
+           ((char=? c #\")
+            (if (null? characters)
+                (loop (+ pos 1) tokens '() #t)
+                (loop pos (cons (list->string (reverse characters)) tokens) '() #f)))
+
+           ;; Symbol character
+           (else
+            (loop (+ pos 1) tokens (cons c characters) #f)))))))
+
 (define* (man-page->entry file #:optional (resolve identity))
   "Parse FILE, a gzip or zstd compressed man page, and return a <mandb-entry>
 for it."
-  (define (string->number* str)
-    (if (and (string-prefix? "\"" str)
-             (> (string-length str) 1)
-             (string-suffix? "\"" str))
-        (string->number (string-drop (string-drop-right str 1) 1))
-        (string->number str)))
-
   (define call-with-input-port*
     (cond
      ((gzip-compressed? file) call-with-gzip-input-port)
@@ -189,8 +225,10 @@ (define* (man-page->entry file #:optional (resolve identity))
               (if (eof-object? line)
                   (mandb-entry file name (or section 0) (or synopsis "")
                                kind)
-                  (match (string-tokenize line)
-                    ((".TH" name (= string->number* section) _ ...)
+                  ;; man 7 groff groff_mdoc groff_man
+                  ;; look for metadata in macro invocations (lines starting with .)
+                  (match (and (string-prefix? "." line) (man-macro-tokenize line))
+                    ((".TH" name (= string->number section) _ ...)
                      (loop name section synopsis kind))
                     ((".SH" (or "NAME" "\"NAME\""))
                      (loop name section (read-synopsis port) kind))

base-commit: 5735c278e16517d9be5e26235fe68dea9bae3527
prerequisite-patch-id: f9cc903b8048c8c6fde576fbf38ab110263020e3
prerequisite-patch-id: 220ddf11addf3a6c7ab3b349077bca6849241556
prerequisite-patch-id: fc7d254c8dc198bc2f083e1c8aea18960c73b165
prerequisite-patch-id: b6d30068ce4971d4d8e67517229916df4e76c529
-- 
2.49.0





Information forwarded to sarg <at> sarg.org.ru, ludo <at> gnu.org, guix <at> cbaines.net, dev <at> jpoiret.xyz, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Tue, 01 Apr 2025 19:33:02 GMT) Full text and rfc822 format available.

Message #23 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: 77387 <at> debbugs.gnu.org
Cc: Sergey Trofimov <sarg <at> sarg.org.ru>
Subject: [PATCH v1 2/2] man-db: Support mdoc-formatted man pages.
Date: Tue,  1 Apr 2025 21:32:00 +0200
* guix/man-db.scm (man-page->entry): Extract man name and section from
.Dt macro.

Change-Id: I02dc99d73dceecdb077315805025efad9a650e91
---
 guix/man-db.scm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/guix/man-db.scm b/guix/man-db.scm
index 94231264f0..7601580c40 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -228,10 +228,13 @@ (define* (man-page->entry file #:optional (resolve identity))
                   ;; man 7 groff groff_mdoc groff_man
                   ;; look for metadata in macro invocations (lines starting with .)
                   (match (and (string-prefix? "." line) (man-macro-tokenize line))
-                    ((".TH" name (= string->number section) _ ...)
+                    ;; "Title Header" or "Document title"
+                    (((or ".TH" ".Dt") name (= string->number section) _ ...)
                      (loop name section synopsis kind))
+                    ;; "Section Header"
                     ((".SH" (or "NAME" "\"NAME\""))
                      (loop name section (read-synopsis port) kind))
+                    ;; include source
                     ((".so" link)
                      (match (and=> (resolve link)
                                    (cut man-page->entry <> resolve))
-- 
2.49.0





Information forwarded to guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Tue, 01 Apr 2025 19:43:02 GMT) Full text and rfc822 format available.

Message #26 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, 77387 <at> debbugs.gnu.org,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Christopher Baines <guix <at> cbaines.net>
Subject: Re: [bug#77387] [PATCH 1/2] man-db: Parse man macro arguments better.
Date: Tue, 01 Apr 2025 21:42:02 +0200
Hi Ludovic,

I've sent an amended series.

Ludovic Courtès <ludo <at> gnu.org> writes:

>> +             (in-string? #f))
>> +    (if (>= pos (string-length input))
>> +        ;; End of input
>> +        (unless in-string?
>> +          (reverse (if (null? current)
>> +                       tokens
>> +                       (cons (list->string (reverse current)) tokens))))
>
> So this procedure can return *unspecified*, right?  Sounds fishy.
>
Why is it fishy? Is it unconventional? Such return value is handled
correctly by the calling code (`match`).




Information forwarded to guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Tue, 08 Apr 2025 15:31:01 GMT) Full text and rfc822 format available.

Message #29 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Sergey Trofimov <sarg <at> sarg.org.ru>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, 77387 <at> debbugs.gnu.org,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Christopher Baines <guix <at> cbaines.net>
Subject: Re: [bug#77387] [PATCH 1/2] man-db: Parse man macro arguments better.
Date: Tue, 08 Apr 2025 17:11:55 +0200
Hi,

Sergey Trofimov <sarg <at> sarg.org.ru> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>>> +             (in-string? #f))
>>> +    (if (>= pos (string-length input))
>>> +        ;; End of input
>>> +        (unless in-string?
>>> +          (reverse (if (null? current)
>>> +                       tokens
>>> +                       (cons (list->string (reverse current)) tokens))))
>>
>> So this procedure can return *unspecified*, right?  Sounds fishy.
>>
> Why is it fishy? Is it unconventional? Such return value is handled
> correctly by the calling code (`match`).

It’s unconventional; usually, procedures are monomorphic and in this
case, the expectation is that it always returns a list of tokens.

I would either return the empty list in the ‘in-string?’ case or throw
an exception (because that means we failed to parse the thing).

Does that make sense?

Thanks,
Ludo’.




Information forwarded to sarg <at> sarg.org.ru, ludo <at> gnu.org, guix <at> cbaines.net, dev <at> jpoiret.xyz, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Wed, 09 Apr 2025 12:47:02 GMT) Full text and rfc822 format available.

Message #32 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: 77387 <at> debbugs.gnu.org
Cc: Sergey Trofimov <sarg <at> sarg.org.ru>
Subject: [PATCH v2 1/2] man-db: Parse man macro arguments better.
Date: Wed,  9 Apr 2025 14:46:40 +0200
* guix/man-db.scm (man-macro-tokenize): New procedure to parse man
macros.
(man-page->entry): Parse macro line using man-macro-tokenize.

Change-Id: Iea0ffbc65290757df746138e0a6174646b5a3eb8
---
 guix/man-db.scm | 55 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 46 insertions(+), 9 deletions(-)

diff --git a/guix/man-db.scm b/guix/man-db.scm
index bba90ed473..1259658f52 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -161,16 +161,51 @@ (define (read-synopsis port)
       (line
        (loop (cons line lines))))))
 
+(define (man-macro-tokenize input)
+  "Split INPUT string, a man macro invocation, into a list containing the macro's
+name followed by its arguments."
+  (let loop ((pos 0)
+             (tokens '())
+             (characters '())
+             (in-string? #f))
+    (if (>= pos (string-length input))
+        ;; End of input
+        (reverse (if (null? characters)
+                     tokens
+                     (cons (list->string (reverse characters)) tokens)))
+        (let ((c (string-ref input pos)))
+          (cond
+           ;; Inside a string
+           (in-string?
+            (if (char=? c #\")
+                (if (and (< (+ pos 1) (string-length input))
+                         (char=? (string-ref input (+ pos 1)) #\"))
+                    ;; Double quote inside string
+                    (loop (+ pos 2) tokens (cons #\" characters) #t)
+                    ;; End of string
+                    (loop (+ pos 1) (cons (list->string (reverse characters)) tokens) '() #f))
+                ;; Regular character in string
+                (loop (+ pos 1) tokens (cons c characters) #t)))
+
+           ;; Whitespace outside string
+           ((char-whitespace? c)
+            (if (null? characters)
+                (loop (+ pos 1) tokens '() #f)
+                (loop (+ pos 1) (cons (list->string (reverse characters)) tokens) '() #f)))
+
+           ;; Start of string
+           ((char=? c #\")
+            (if (null? characters)
+                (loop (+ pos 1) tokens '() #t)
+                (loop pos (cons (list->string (reverse characters)) tokens) '() #f)))
+
+           ;; Symbol character
+           (else
+            (loop (+ pos 1) tokens (cons c characters) #f)))))))
+
 (define* (man-page->entry file #:optional (resolve identity))
   "Parse FILE, a gzip or zstd compressed man page, and return a <mandb-entry>
 for it."
-  (define (string->number* str)
-    (if (and (string-prefix? "\"" str)
-             (> (string-length str) 1)
-             (string-suffix? "\"" str))
-        (string->number (string-drop (string-drop-right str 1) 1))
-        (string->number str)))
-
   (define call-with-input-port*
     (cond
      ((gzip-compressed? file) call-with-gzip-input-port)
@@ -189,8 +224,10 @@ (define* (man-page->entry file #:optional (resolve identity))
               (if (eof-object? line)
                   (mandb-entry file name (or section 0) (or synopsis "")
                                kind)
-                  (match (string-tokenize line)
-                    ((".TH" name (= string->number* section) _ ...)
+                  ;; man 7 groff groff_mdoc groff_man
+                  ;; look for metadata in macro invocations (lines starting with .)
+                  (match (and (string-prefix? "." line) (man-macro-tokenize line))
+                    ((".TH" name (= string->number section) _ ...)
                      (loop name section synopsis kind))
                     ((".SH" (or "NAME" "\"NAME\""))
                      (loop name section (read-synopsis port) kind))

base-commit: 621d09a185b106364fe7636923ab39c8bca35141
--
2.49.0





Information forwarded to sarg <at> sarg.org.ru, ludo <at> gnu.org, guix <at> cbaines.net, dev <at> jpoiret.xyz, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Wed, 09 Apr 2025 12:48:02 GMT) Full text and rfc822 format available.

Message #35 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: 77387 <at> debbugs.gnu.org
Cc: Sergey Trofimov <sarg <at> sarg.org.ru>
Subject: [PATCH v2 2/2] man-db: Support mdoc-formatted man pages.
Date: Wed,  9 Apr 2025 14:46:41 +0200
* guix/man-db.scm (man-page->entry): Extract man name and section from
.Dt macro.

Change-Id: I02dc99d73dceecdb077315805025efad9a650e91
---
 guix/man-db.scm | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/guix/man-db.scm b/guix/man-db.scm
index 1259658f52..59723fb336 100644
--- a/guix/man-db.scm
+++ b/guix/man-db.scm
@@ -227,10 +227,13 @@ (define* (man-page->entry file #:optional (resolve identity))
                   ;; man 7 groff groff_mdoc groff_man
                   ;; look for metadata in macro invocations (lines starting with .)
                   (match (and (string-prefix? "." line) (man-macro-tokenize line))
-                    ((".TH" name (= string->number section) _ ...)
+                    ;; "Title Header" or "Document title"
+                    (((or ".TH" ".Dt") name (= string->number section) _ ...)
                      (loop name section synopsis kind))
+                    ;; "Section Header"
                     ((".SH" (or "NAME" "\"NAME\""))
                      (loop name section (read-synopsis port) kind))
+                    ;; include source
                     ((".so" link)
                      (match (and=> (resolve link)
                                    (cut man-page->entry <> resolve))
-- 
2.49.0





Information forwarded to guix-patches <at> gnu.org:
bug#77387; Package guix-patches. (Wed, 09 Apr 2025 13:01:02 GMT) Full text and rfc822 format available.

Message #38 received at 77387 <at> debbugs.gnu.org (full text, mbox):

From: Sergey Trofimov <sarg <at> sarg.org.ru>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, 77387 <at> debbugs.gnu.org,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Christopher Baines <guix <at> cbaines.net>
Subject: Re: [bug#77387] [PATCH 1/2] man-db: Parse man macro arguments better.
Date: Wed, 09 Apr 2025 14:59:55 +0200
Hi,

Ludovic Courtès <ludo <at> gnu.org> writes:
>>>> +             (in-string? #f))
>>>> +    (if (>= pos (string-length input))
>>>> +        ;; End of input
>>>> +        (unless in-string?
>>>> +          (reverse (if (null? current)
>>>> +                       tokens
>>>> +                       (cons (list->string (reverse current)) tokens))))
>>>
>>> So this procedure can return *unspecified*, right?  Sounds fishy.
>>>
>> Why is it fishy? Is it unconventional? Such return value is handled
>> correctly by the calling code (`match`).
>
> It’s unconventional; usually, procedures are monomorphic and in this
> case, the expectation is that it always returns a list of tokens.
>
> I would either return the empty list in the ‘in-string?’ case or throw
> an exception (because that means we failed to parse the thing).
>
> Does that make sense?

I wouldn't throw an exception as this would break the derivation
building an consequently profile switch. I've removed the offending
`unless` altogether. `man` itself seem to be forgiving for such syntax
violations.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Fri, 11 Apr 2025 10:19:03 GMT) Full text and rfc822 format available.

Notification sent to Sergey Trofimov <sarg <at> sarg.org.ru>:
bug acknowledged by developer. (Fri, 11 Apr 2025 10:19:03 GMT) Full text and rfc822 format available.

Message #43 received at 77387-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Sergey Trofimov <sarg <at> sarg.org.ru>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, 77387-done <at> debbugs.gnu.org,
 Christopher Baines <guix <at> cbaines.net>
Subject: Re: [bug#77387] [PATCH v2 1/2] man-db: Parse man macro arguments
 better.
Date: Fri, 11 Apr 2025 12:05:58 +0200
Hi,

I applied v2, thank you!

I confirmed that this goes from:

--8<---------------cut here---------------start------------->8---
$ guix describe
Generation 342  Apr 06 2025 23:07:09    (current)
  shepherd d98d61a
    repository URL: https://git.savannah.gnu.org/git/shepherd.git
    branch: main
    commit: d98d61a8a3f20de46d18ce4a8af05c93fab20b89
  guile af96820
    repository URL: https://git.savannah.gnu.org/git/guile.git
    branch: main
    commit: af96820e072d18c49ac03e80c6f3466d568dc77d
  guix 6af6806
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 6af680670bf9055b90e6f8b63c4c2ab7b08e7c56
ludo <at> ribbon ~/src/guix$ guix shell man-db openssh -C -- man -k ssh
ssh (0)              - (unknown subject)
ssh-add (0)          - (unknown subject)
ssh-agent (0)        - (unknown subject)
ssh-copy-id (0)      - (unknown subject)
ssh-keygen (0)       - (unknown subject)
ssh-keyscan (0)      - (unknown subject)
ssh-keysign (0)      - (unknown subject)
ssh-pkcs11-helper (0) - (unknown subject)
ssh-sk-helper (0)    - (unknown subject)
ssh_config (0)       - (unknown subject)
sshd (0)             - (unknown subject)
sshd_config (0)      - (unknown subject)
--8<---------------cut here---------------end--------------->8---

… to:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix shell man-db openssh -C -- man -k ssh
ssh (1)              - (unknown subject)
ssh-add (1)          - (unknown subject)
ssh-agent (1)        - (unknown subject)
ssh-copy-id (1)      - (unknown subject)
ssh-keygen (1)       - (unknown subject)
ssh-keyscan (1)      - (unknown subject)
ssh-keysign (8)      - (unknown subject)
ssh-pkcs11-helper (8) - (unknown subject)
ssh-sk-helper (8)    - (unknown subject)
ssh_config (5)       - (unknown subject)
sshd (8)             - (unknown subject)
sshd_config (5)      - (unknown subject)
--8<---------------cut here---------------end--------------->8---

… which will undoubtedly be more convenient.  :-)

Thanks!

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 09 May 2025 11:24:15 GMT) Full text and rfc822 format available.

This bug report was last modified 38 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.