From debbugs-submit-bounces@debbugs.gnu.org Fri Oct 22 08:41:18 2021 Received: (at submit) by debbugs.gnu.org; 22 Oct 2021 12:41:18 +0000 Received: from localhost ([127.0.0.1]:59650 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mdtrC-00034L-IA for submit@debbugs.gnu.org; Fri, 22 Oct 2021 08:41:18 -0400 Received: from lists.gnu.org ([209.51.188.17]:60132) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mdtr8-000349-A5 for submit@debbugs.gnu.org; Fri, 22 Oct 2021 08:41:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45500) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mdtr3-00024Q-Hf for guix-patches@gnu.org; Fri, 22 Oct 2021 08:41:03 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:42844) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mdtr2-0004w6-HM; Fri, 22 Oct 2021 08:41:00 -0400 Received: from [193.50.110.110] (port=53678 helo=gnu.org) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mdtr2-0002WN-6r; Fri, 22 Oct 2021 08:41:00 -0400 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= To: guix-patches@gnu.org Subject: [PATCH 0/2] Detect early and gracefully handle invalid Texinfo Date: Fri, 22 Oct 2021 14:40:52 +0200 Message-Id: <20211022124052.28197-1-ludo@gnu.org> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit Cc: =?UTF-8?q?Ludovic=20Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello! It’s a fact that we occasionally push invalid Texinfo markup in package descriptions/synopses, probably even more so in external channels, and despite the fact that ‘guix lint’ flags it. The problem is that some of the tools were designed around the idea that invalid Texinfo “does not happen”. For example, if a single package contains invalid markup, ‘guix search’ and ‘guix show’ crash badly: --8<---------------cut here---------------start------------->8--- $ guix search ghc citations name: ghc-citeproc version: 0.4.0.1 outputs: out systems: x86_64-linux i686-linux dependencies: ghc-aeson-pretty@0.8.8 ghc-aeson@1.5.6.0 ghc-attoparsec@0.13.2.5 + ghc-base-compat@0.11.2 ghc-case-insensitive@1.2.1.0 ghc-data-default@0.7.1.1 ghc-diff@0.4.0 + ghc-file-embed@0.0.15.0 ghc-pandoc-types@1.22 ghc-safe@0.3.19 ghc-scientific@0.3.7.0 + ghc-timeit@2.0 ghc-unicode-collation@0.1.3 ghc-uniplate@1.6.13 ghc-vector@0.12.3.0 + ghc-xml-conduit@1.9.1.1 location: gnu/packages/haskell-xyz.scm:15823:2 homepage: https://hackage.haskell.org/package/citeproc license: FreeBSD synopsis: Generate citations and bibliography from CSL styles Backtrace: 13 (primitive-load "/home/ludo/.config/guix/current/bin/gu…") In guix/ui.scm: 2185:7 12 (run-guix . _) 2148:10 11 (run-guix-command _ . _) In ice-9/boot-9.scm: 1752:10 10 (with-exception-handler _ _ #:unwind? _ # _) In guix/scripts/package.scm: 896:9 9 (_) In ice-9/boot-9.scm: 1747:15 8 (with-exception-handler # …) In guix/ui.scm: 1677:23 7 (call-with-paginated-output-port _ #:less-options _) 1712:11 6 (_ #) 1558:14 5 (package->recutils _ # _ # _ …) 1432:23 4 (texi->plain-text _) In texinfo.scm: 1132:22 3 (parse _) 967:36 2 (loop # (*fragment*) # …) 92:2 1 (command-spec _) In ice-9/boot-9.scm: 1685:16 0 (raise-exception _ #:continuable? _) ice-9/boot-9.scm:1685:16: In procedure raise-exception: Throw to key `parser-error' with args `(#f "Unknown command" urefhttps)'. --8<---------------cut here---------------end--------------->8--- (This one was fixed in c3c502896b1454b345ee9f17d20063853652a35a.) This series does two things: 1. Emit a warning when invalid markup is encountered but keep going. 2. Raise a syntax error, at macro-expansion time, when invalid markup is encountered. Obviously #2 incurs some overhead, since it parses Texinfo strings at expansion time, so it’s enabled only when ‘GUIX_UNINSTALLED’ is set—that is, when working on a checkout with ./pre-inst-env. The expanded code is exactly the same as before though, without any overhead. Concretely, that means that ‘make’ fail and you just don’t see the package until the error has been fixed: --8<---------------cut here---------------start------------->8--- $ make […] [ 78%] LOAD gnu/packages/haskell-xyz.scm ;;; note: source file ./gnu/packages/haskell-xyz.scm ;;; newer than compiled /home/ludo/src/guix/gnu/packages/haskell-xyz.go ;;; note: source file ./gnu/packages/haskell-xyz.scm ;;; newer than compiled /home/ludo/src/guix/gnu/packages/haskell-xyz.go gnu/packages/haskell-xyz.scm:15855:5: error: "@code{ghc-citeproc} parses @acronym{Citation Style Language, CSL} style files\nand uses them to generate a list of formatted citations and bibliography\nentries. For more information about CSL, see @urefhttps://citationstyles.org/}.": invalid Texinfo markup make[2]: *** [Makefile:7131: make-packages-go] Error 1 --8<---------------cut here---------------end--------------->8--- Feedback welcome! Ludo’. Ludovic Courtès (2): ui: Gracefully handle invalid Texinfo markup in package blurbs. packages: Optionally validate Texinfo markup at expansion time. guix/packages.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++--- guix/ui.scm | 17 ++++++++++++++-- 2 files changed, 64 insertions(+), 5 deletions(-) base-commit: e1261ddd38cf02a0f046f3a5360502d659b4e7d4 -- 2.33.0 From debbugs-submit-bounces@debbugs.gnu.org Fri Oct 22 08:46:16 2021 Received: (at 51332) by debbugs.gnu.org; 22 Oct 2021 12:46:16 +0000 Received: from localhost ([127.0.0.1]:59656 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mdtvu-0003DM-ID for submit@debbugs.gnu.org; Fri, 22 Oct 2021 08:46:16 -0400 Received: from eggs.gnu.org ([209.51.188.92]:48978) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mdtvs-0003Cb-Oj for 51332@debbugs.gnu.org; Fri, 22 Oct 2021 08:46:01 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:42912) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mdtvk-0007Pv-T4; Fri, 22 Oct 2021 08:45:53 -0400 Received: from [193.50.110.110] (port=53680 helo=gnu.org) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mdtvT-0002o8-9J; Fri, 22 Oct 2021 08:45:52 -0400 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= To: 51332@debbugs.gnu.org Subject: [PATCH 1/2] ui: Gracefully handle invalid Texinfo markup in package blurbs. Date: Fri, 22 Oct 2021 14:45:18 +0200 Message-Id: <20211022124519.28473-1-ludo@gnu.org> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 51332 Cc: =?UTF-8?q?Ludovic=20Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Previously 'guix search' & co. would crash when encountering invalid Texinfo. * guix/ui.scm (texi->plain-text*): New procedure. (package-field-string, package->recutils): Use it. --- guix/ui.scm | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/guix/ui.scm b/guix/ui.scm index 1428c254b3..eb7f0afcfd 100644 --- a/guix/ui.scm +++ b/guix/ui.scm @@ -1431,10 +1431,22 @@ (define (texi->plain-text str) (with-fluids ((%default-port-encoding "UTF-8")) (stexi->plain-text (texi-fragment->stexi str)))) +(define (texi->plain-text* package str) + "Same as 'texi->plain-text', but gracefully handle Texinfo errors." + (catch 'parser-error + (lambda () + (texi->plain-text str)) + (lambda args + (warning (package-location package) + (G_ "~a: invalid Texinfo markup~%") + (package-full-name package)) + str))) + (define (package-field-string package field-accessor) "Return a plain-text representation of PACKAGE field." (and=> (field-accessor package) - (compose texi->plain-text P_))) + (lambda (str) + (texi->plain-text* package (P_ str))))) (define (package-description-string package) "Return a plain-text representation of PACKAGE description field." @@ -1555,7 +1567,8 @@ (define (packageplain-text' on the concatenated string to account ;; for the width of "description:" in paragraph filling. - (texi->plain-text + (texi->plain-text* + p (string-append "description: " (or (and=> (package-description p) P_) "")))) -- 2.33.0 From debbugs-submit-bounces@debbugs.gnu.org Fri Oct 22 08:46:16 2021 Received: (at 51332) by debbugs.gnu.org; 22 Oct 2021 12:46:16 +0000 Received: from localhost ([127.0.0.1]:59658 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mdtw8-0003Dk-BT for submit@debbugs.gnu.org; Fri, 22 Oct 2021 08:46:16 -0400 Received: from eggs.gnu.org ([209.51.188.92]:48982) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mdtvs-0003Cc-Ow for 51332@debbugs.gnu.org; Fri, 22 Oct 2021 08:46:01 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:42924) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mdtvm-0007VH-6t; Fri, 22 Oct 2021 08:45:54 -0400 Received: from [193.50.110.110] (port=53680 helo=gnu.org) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mdtvl-0002o8-8v; Fri, 22 Oct 2021 08:45:53 -0400 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= To: 51332@debbugs.gnu.org Subject: [PATCH 2/2] packages: Optionally validate Texinfo markup at expansion time. Date: Fri, 22 Oct 2021 14:45:19 +0200 Message-Id: <20211022124519.28473-2-ludo@gnu.org> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20211022124519.28473-1-ludo@gnu.org> References: <20211022124519.28473-1-ludo@gnu.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 51332 Cc: =?UTF-8?q?Ludovic=20Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) * guix/packages.scm (validate-texinfo): New macro. ()[synopsis, description]: Add 'sanitize' property. --- guix/packages.scm | 52 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/guix/packages.scm b/guix/packages.scm index e5a9d08bce..394f6aa39e 100644 --- a/guix/packages.scm +++ b/guix/packages.scm @@ -49,6 +49,7 @@ (define-module (guix packages) #:use-module (srfi srfi-35) #:use-module (rnrs bytevectors) #:use-module (web uri) + #:autoload (texinfo) (texi-fragment->stexi) #:re-export (%current-system %current-target-system search-path-specification) ;for convenience @@ -437,6 +438,49 @@ (define location (lambda (s) #,location))) body ...)))))) +(define-syntax validate-texinfo + (let ((validate? (getenv "GUIX_UNINSTALLED"))) + (define ensure-thread-safe-texinfo-parser! + ;; Work around for Guile <= 3.0.7. + (let ((patched? (or (> (string->number (major-version)) 3) + (> (string->number (minor-version)) 0) + (> (string->number (micro-version)) 7))) + (next-token-of/thread-safe + (lambda (pred port) + (let loop ((chars '())) + (match (read-char port) + ((? eof-object?) + (list->string (reverse! chars))) + (chr + (let ((chr* (pred chr))) + (if chr* + (loop (cons chr* chars)) + (begin + (unread-char chr port) + (list->string (reverse! chars))))))))))) + (lambda () + (unless patched? + (set! (@@ (texinfo) next-token-of) next-token-of/thread-safe) + (set! patched? #t))))) + + (lambda (s) + "Raise a syntax error when passed a literal string that is not valid +Texinfo. Otherwise, return the string." + (syntax-case s () + ((_ str) + (string? (syntax->datum #'str)) + (if validate? + (catch 'parser-error + (lambda () + (ensure-thread-safe-texinfo-parser!) + (texi-fragment->stexi (syntax->datum #'str)) + #'str) + (lambda _ + (syntax-violation 'package "invalid Texinfo markup" #'str))) + #'str)) + ((_ obj) + #'obj))))) + ;; A package. (define-record-type* package make-package @@ -471,9 +515,11 @@ (define-record-type* (replacement package-replacement ; package | #f (default #f) (thunked) (innate)) - (synopsis package-synopsis) ; one-line description - (description package-description) ; one or two paragraphs - (license package-license) ; instance or list + (synopsis package-synopsis + (sanitize validate-texinfo)) ; one-line description + (description package-description + (sanitize validate-texinfo)) ; one or two paragraphs + (license package-license) ; instance or list (home-page package-home-page) (supported-systems package-supported-systems ; list of strings (default %supported-systems)) -- 2.33.0 From debbugs-submit-bounces@debbugs.gnu.org Thu Oct 28 15:46:25 2021 Received: (at 51332-done) by debbugs.gnu.org; 28 Oct 2021 19:46:25 +0000 Received: from localhost ([127.0.0.1]:53299 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mgBM0-0002Mz-Ut for submit@debbugs.gnu.org; Thu, 28 Oct 2021 15:46:25 -0400 Received: from eggs.gnu.org ([209.51.188.92]:49234) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mgBLy-0002Mj-Cl for 51332-done@debbugs.gnu.org; Thu, 28 Oct 2021 15:46:23 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:33762) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mgBLt-0002ml-2F for 51332-done@debbugs.gnu.org; Thu, 28 Oct 2021 15:46:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=Un2qGNBKg4OC8QIbDGKOuq6OLA1okPXfyzBkPLaM9Bw=; b=j/yt/Uz9paVrCbELlZi3 2jVO1CWcqmWhpJHwTMJMklip59YMX2FtdyNs1r/M2tCqLcJAW0GrEBs5dwSCo4yncGUZ893+8xkjU dURVrlKDLHVwHauUT5sw567G5lRKFLbPlW6WOg6zLxIc02koxXwfy6p/UWYdbtydblIuwhmqI9Ymj 1LPmeT4jFdQVWbayx1WNK6gTY9xUCOQSIvaoKsZbqyje9FdcGks4973oQPfDJ2juMCvnTYl3iz8xY xFqaiOB1KkX7D0WAtLDOhunfCneD3bXOMdX3hgGgzq3chm829Goyb8gsXs4oik8DIjTtPva3LlOM9 KPJrdsWbCCMIqQ==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:54866 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mgBLs-0008SE-ML for 51332-done@debbugs.gnu.org; Thu, 28 Oct 2021 15:46:16 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: 51332-done@debbugs.gnu.org Subject: Re: bug#51332: [PATCH 0/2] Detect early and gracefully handle invalid Texinfo References: <20211022124052.28197-1-ludo@gnu.org> Date: Thu, 28 Oct 2021 21:46:14 +0200 In-Reply-To: <20211022124052.28197-1-ludo@gnu.org> ("Ludovic =?utf-8?Q?Cou?= =?utf-8?Q?rt=C3=A8s=22's?= message of "Fri, 22 Oct 2021 14:40:52 +0200") Message-ID: <87fssl3pbt.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 51332-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Ludovic Court=C3=A8s skribis: > ui: Gracefully handle invalid Texinfo markup in package blurbs. > packages: Optionally validate Texinfo markup at expansion time. Pushed as e171182a20962c4119e12439b92bbbfd59b1495e! Ludo'. From unknown Thu Aug 14 21:49:21 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 26 Nov 2021 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator