From unknown Mon Jun 16 23:42:58 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#40582 <40582@debbugs.gnu.org> To: bug#40582 <40582@debbugs.gnu.org> Subject: Status: Valid URIs are rejected Reply-To: bug#40582 <40582@debbugs.gnu.org> Date: Tue, 17 Jun 2025 06:42:58 +0000 retitle 40582 Valid URIs are rejected reassign 40582 guile submitter 40582 Julien Lepiller severity 40582 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 12 15:44:57 2020 Received: (at submit) by debbugs.gnu.org; 12 Apr 2020 19:44:57 +0000 Received: from localhost ([127.0.0.1]:58915 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jNiXI-0001Fv-Pv for submit@debbugs.gnu.org; Sun, 12 Apr 2020 15:44:57 -0400 Received: from lists.gnu.org ([209.51.188.17]:41050) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jNiXG-0001Fn-Ma for submit@debbugs.gnu.org; Sun, 12 Apr 2020 15:44:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34274) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jNiXE-0008On-LP for bug-guile@gnu.org; Sun, 12 Apr 2020 15:44:54 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05,URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jNiXD-00064r-MG for bug-guile@gnu.org; Sun, 12 Apr 2020 15:44:52 -0400 Received: from lepiller.eu ([2a00:5884:8208::1]:47880) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jNiXD-0005uz-4C for bug-guile@gnu.org; Sun, 12 Apr 2020 15:44:51 -0400 Received: from lepiller.eu (localhost [127.0.0.1]) by lepiller.eu (OpenSMTPD) with ESMTP id 1add9888 for ; Sun, 12 Apr 2020 19:44:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=lepiller.eu; h=date :mime-version:content-type:content-transfer-encoding:subject:to :from:message-id; s=dkim; bh=92pDNPmyLtAeNtEwyDGwK4xj3dFzdghd+eF CQLj2glQ=; b=efVmsJ5BMlyrXcWKW1uRg58wF2c5k6fOtFyqgLWP2sO4fJ3Io+1 yJKCZo6GVkKwBphezlbed8q49s17r/hC3uhQXjF+TPN6jWhSB2pVYRHQtcNKIbxi 7w4FJBDzKfOxt4bgCOJteFL8gQ77mzdTtk1N1cTy6Tvy88HdB5WfFt0OiUzIwUUR eZQKGlHQdN6iZnfOJFJE2mwTQ1HYMT47gLpMZ3gxd/AUrmuB0YY54z/cSN7bKViW VA2Ba/hjLT4tWI1vE0O2nXU3CYEc6Iqwp8mVuP3tJaLeqGcD5goJ8VhQBfIt1bjW qe2yOpbZnagJQ1csh6W1IIu7xeSvNn+S4PA== Received: by lepiller.eu (OpenSMTPD) with ESMTPSA id 1f35d3ab (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO) for ; Sun, 12 Apr 2020 19:44:42 +0000 (UTC) Date: Sun, 12 Apr 2020 15:44:31 -0400 User-Agent: K-9 Mail for Android MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Valid URIs are rejected To: bug-guile@gnu.org From: Julien Lepiller Message-ID: <3EFDD2B8-58F2-41E1-997B-76098A9A3715@lepiller.eu> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:5884:8208::1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hi, Using (web uri), I was trying to parse "uri://a/c"=2E Reading RFC3986, it = should be a valid URI (see rule for reg-name in 3=2E2=2E2)=2E However, pass= ing it to string->uri results in #f=2E I've tracked this down to valid-host= ? which returns #f for "a"=2E The reason is that the regexp checking if the host is an ipv6 matches "a",= which shouldn't happen because a is not an ipv6 address=2E Indeed, when I = try (string->uri "uri://g/b"), I get the expected result=2E From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 17 17:57:48 2020 Received: (at 40582) by debbugs.gnu.org; 17 Jun 2020 21:57:48 +0000 Received: from localhost ([127.0.0.1]:52334 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jlg44-00085X-5s for submit@debbugs.gnu.org; Wed, 17 Jun 2020 17:57:48 -0400 Received: from eggs.gnu.org ([209.51.188.92]:37718) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jlg40-00085I-DW for 40582@debbugs.gnu.org; Wed, 17 Jun 2020 17:57:47 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:52815) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jlg3u-0000tW-Fv; Wed, 17 Jun 2020 17:57:38 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=58746 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jlg3r-0003a6-Sh; Wed, 17 Jun 2020 17:57:37 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Julien Lepiller Subject: Re: bug#40582: Valid URIs are rejected References: <3EFDD2B8-58F2-41E1-997B-76098A9A3715@lepiller.eu> Date: Wed, 17 Jun 2020 23:57:33 +0200 In-Reply-To: <3EFDD2B8-58F2-41E1-997B-76098A9A3715@lepiller.eu> (Julien Lepiller's message of "Sun, 12 Apr 2020 15:44:31 -0400") Message-ID: <878sglpd82.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 40582 Cc: 40582@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Julien, Julien Lepiller skribis: > Using (web uri), I was trying to parse "uri://a/c". Reading RFC3986, it s= hould be a valid URI (see rule for reg-name in 3.2.2). However, passing it = to string->uri results in #f. I've tracked this down to valid-host? which r= eturns #f for "a". > > The reason is that the regexp checking if the host is an ipv6 matches "a"= , which shouldn't happen because a is not an ipv6 address. Indeed, when I t= ry (string->uri "uri://g/b"), I get the expected result. Right. =E2=80=98authority-regexp=E2=80=99 is fine, but =E2=80=98ipv6-regex= p=E2=80=99, used by =E2=80=98valid-host?=E2=80=99, was too lax and would match =E2=80=9Ca=E2=80= =9D because it=E2=80=99s an hex digit sequence. The regexp below is still an approximation, but I think a better one. Can you confirm? Thanks, Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable diff --git a/module/web/uri.scm b/module/web/uri.scm index b4b89b9cc..d76432737 100644 --- a/module/web/uri.scm +++ b/module/web/uri.scm @@ -188,7 +188,7 @@ for =E2=80=98build-uri=E2=80=99 except there is no sche= me." (define ipv4-regexp (make-regexp (string-append "^([" digits ".]+)$"))) (define ipv6-regexp - (make-regexp (string-append "^([" hex-digits ":.]+)$"))) + (make-regexp (string-append "^([" hex-digits "]*:[" hex-digits ":.]+)$")= )) (define domain-label-regexp (make-regexp (string-append "^[" letters digits "]" diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test index 94778acac..95fd82f16 100644 --- a/test-suite/tests/web-uri.test +++ b/test-suite/tests/web-uri.test @@ -1,6 +1,6 @@ ;;;; web-uri.test --- URI library -*- mode: scheme; coding: utf-8= ; -*- ;;;; -;;;; Copyright (C) 2010-2012, 2014, 2017, 2019 Free Software Foundation, = Inc. +;;;; Copyright (C) 2010-2012, 2014, 2017, 2019, 2020 Free Software Founda= tion, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -179,6 +179,13 @@ #:port 22 #:path "/baz")) =20 + (pass-if-equal "xyz://abc/x/y/z" ; + (list 'xyz "abc" "/x/y/z") + (let ((uri (string->uri "xyz://abc/x/y/z"))) + (list (uri-scheme uri) + (uri-host uri) + (uri-path uri)))) + (pass-if "http://bad.host.1" (not (string->uri "http://bad.host.1"))) =20 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 17 21:20:11 2020 Received: (at 40582) by debbugs.gnu.org; 18 Jun 2020 01:20:11 +0000 Received: from localhost ([127.0.0.1]:52478 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jljDv-0004Ue-0D for submit@debbugs.gnu.org; Wed, 17 Jun 2020 21:20:11 -0400 Received: from lepiller.eu ([89.234.186.109]:35430) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jljDp-0004UL-3t for 40582@debbugs.gnu.org; Wed, 17 Jun 2020 21:20:09 -0400 Received: from lepiller.eu (localhost [127.0.0.1]) by lepiller.eu (OpenSMTPD) with ESMTP id 24a80ced; Thu, 18 Jun 2020 01:20:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=lepiller.eu; h=date :in-reply-to:references:mime-version:content-type :content-transfer-encoding:subject:to:cc:from:message-id; s= dkim; bh=StWDsEGfJlPUS2niaJEqbeEJ83aMnnaykIVYgsHqkqg=; b=PvhRmRh 2SyCcYSG0cBODWdO4tdYrodDsuFpM4uNZ7pXlm2hLJ7Vim2eC3eltRkbTtEWPDBe CxlFdBBi0mZY+iFxIZS2x3IyS0i4WVajA5aQBq/H1uM66/IJvnHL7dG6ExHURXzy UFHHX4GfwDhaTmyprL5XT83OJcTJbYe82yhIZnDnA4FhDRAMcy4deLRQtArpit8U tfVmOScudJrCZJPBxWrDX1uenZFf512u0IuT9SWWOBbDeT5Mhl9TQWvQ/WLxVE/K 8SmRiB0FeY/xgW862hc4w3G8mvVyfeb8SdhxZ5/yI1+5w1Mu4F1ctNqNEvmpujtp jSsWQj3TsC+TN7A== Received: by lepiller.eu (OpenSMTPD) with ESMTPSA id 41b14848 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Thu, 18 Jun 2020 01:20:02 +0000 (UTC) Date: Wed, 17 Jun 2020 21:17:11 -0400 User-Agent: K-9 Mail for Android In-Reply-To: <878sglpd82.fsf@gnu.org> References: <3EFDD2B8-58F2-41E1-997B-76098A9A3715@lepiller.eu> <878sglpd82.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: bug#40582: Valid URIs are rejected To: =?ISO-8859-1?Q?Ludovic_Court=E8s?= From: Julien Lepiller Message-ID: <1CE6246D-B9ED-46F7-83F3-A51F0797C991@lepiller.eu> X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40582 Cc: 40582@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Le 17 juin 2020 17:57:33 GMT-04:00, "Ludovic Court=C3=A8s" = a =C3=A9crit : >Hi Julien, > >Julien Lepiller skribis: > >> Using (web uri), I was trying to parse "uri://a/c"=2E Reading RFC3986, >it should be a valid URI (see rule for reg-name in 3=2E2=2E2)=2E However, >passing it to string->uri results in #f=2E I've tracked this down to >valid-host? which returns #f for "a"=2E >> >> The reason is that the regexp checking if the host is an ipv6 matches >"a", which shouldn't happen because a is not an ipv6 address=2E Indeed, >when I try (string->uri "uri://g/b"), I get the expected result=2E > >Right=2E =E2=80=98authority-regexp=E2=80=99 is fine, but =E2=80=98ipv6-r= egexp=E2=80=99, used by >=E2=80=98valid-host?=E2=80=99, was too lax and would match =E2=80=9Ca=E2= =80=9D because it=E2=80=99s an hex >digit >sequence=2E > >The regexp below is still an approximation, but I think a better one=2E >Can you confirm? > >Thanks, >Ludo=E2=80=99=2E Looks slightly better, thanks=2E That's still incorrect, as it will match things that are not ipv6 addresse= s=2E Does it have to be a regexp though? Why not simply check (false-if-exc= eption (inet-pton AF_INET6 host)), as in the return value of valid-host? There's also a ipv6-host-pat that has an incorrect regexp, but I'm not sur= e what it is used for=2E From debbugs-submit-bounces@debbugs.gnu.org Thu Jun 18 11:08:08 2020 Received: (at 40582-done) by debbugs.gnu.org; 18 Jun 2020 15:08:08 +0000 Received: from localhost ([127.0.0.1]:54037 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jlw9A-00047R-H1 for submit@debbugs.gnu.org; Thu, 18 Jun 2020 11:08:08 -0400 Received: from eggs.gnu.org ([209.51.188.92]:34720) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jlw98-00047C-VO for 40582-done@debbugs.gnu.org; Thu, 18 Jun 2020 11:08:07 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:41960) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jlw92-00010t-I5; Thu, 18 Jun 2020 11:08:00 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=32934 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jlw91-00079V-Ts; Thu, 18 Jun 2020 11:08:00 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Julien Lepiller Subject: Re: bug#40582: Valid URIs are rejected References: <3EFDD2B8-58F2-41E1-997B-76098A9A3715@lepiller.eu> <878sglpd82.fsf@gnu.org> <1CE6246D-B9ED-46F7-83F3-A51F0797C991@lepiller.eu> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 1 Messidor an 228 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 18 Jun 2020 17:07:57 +0200 In-Reply-To: <1CE6246D-B9ED-46F7-83F3-A51F0797C991@lepiller.eu> (Julien Lepiller's message of "Wed, 17 Jun 2020 21:17:11 -0400") Message-ID: <871rmcjtte.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 40582-done Cc: 40582-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Julien Lepiller skribis: > Le 17 juin 2020 17:57:33 GMT-04:00, "Ludovic Court=C3=A8s" = a =C3=A9crit : [...] >>The regexp below is still an approximation, but I think a better one. >>Can you confirm? >> >>Thanks, >>Ludo=E2=80=99. > > Looks slightly better, thanks. > > That's still incorrect, as it will match things that are not ipv6 address= es. Does it have to be a regexp though? Why not simply check (false-if-exce= ption (inet-pton AF_INET6 host)), as in the return value of valid-host? Using a regexp makes the code closer to the RFC since the RFC explicitly describes the grammar. It=E2=80=99s also the simple choice here. > There's also a ipv6-host-pat that has an incorrect regexp, but I'm not su= re what it is used for. It=E2=80=99s use for =E2=80=98authority-regexp=E2=80=99, but that one is fi= ne: it requires square brackets around IPv6 addresses. Pushed as 1ab2105339f60dba20c8c9680e49110501f3a6a0. Thanks, Ludo=E2=80=99. From unknown Mon Jun 16 23:42:58 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 17 Jul 2020 11:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator