From debbugs-submit-bounces@debbugs.gnu.org Fri May 17 17:20:01 2019 Received: (at submit) by debbugs.gnu.org; 17 May 2019 21:20:02 +0000 Received: from localhost ([127.0.0.1]:59112 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hRkGl-0007Vj-0k for submit@debbugs.gnu.org; Fri, 17 May 2019 17:20:00 -0400 Received: from eggs.gnu.org ([209.51.188.92]:60269) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hRj5J-0003UL-FV for submit@debbugs.gnu.org; Fri, 17 May 2019 16:04:05 -0400 Received: from lists.gnu.org ([209.51.188.17]:39429) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1hRj5E-0006kB-0L for submit@debbugs.gnu.org; Fri, 17 May 2019 16:04:00 -0400 Received: from eggs.gnu.org ([209.51.188.92]:46417) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hRj5D-0006iN-4P for bug-guix@gnu.org; Fri, 17 May 2019 16:03:59 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05,FREEMAIL_FROM, URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1hRj5C-0006fR-93 for bug-guix@gnu.org; Fri, 17 May 2019 16:03:59 -0400 Received: from mail-lf1-x135.google.com ([2a00:1450:4864:20::135]:43352) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1hRj5B-0006b5-Td for bug-guix@gnu.org; Fri, 17 May 2019 16:03:58 -0400 Received: by mail-lf1-x135.google.com with SMTP id u27so6160179lfg.10 for ; Fri, 17 May 2019 13:03:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=user-agent:from:to:subject:date:message-id:mime-version; bh=xKi3MRV0HuNdh8X12bmgk5Fgyd0OyO94+ZgbwiDLAGE=; b=rNhGEUjcLSJAJeiD3+M1yupmpxxLbQmjD31KqAWNtK4BdMIqyp4KAguKFxRVsbdcQb nZrBkG3vRF+arVBnB43qVW8GFUPGZsqu5Swo9daGd9nooxk/Jaa9cv/HnHncDV6cQbIr MVVJyWqf1FfH/wnNbQR8T0BU24rvqiObFGSt3EYVq+zdsLzy+PReGBC78/j3KkerIWOD Okb8NVPQOGAxzTmRAvqcZqDnH3fda/mdwzV/CZPUUVQQgluVmd7CJqJgqX9aknzcbLjH H7ZKvNYKGnMl2xrQ+URYgQT/gNw0CMWLDefLDtexeplbozfyVkSvHF12KF6FK2UX+enT QLFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:user-agent:from:to:subject:date:message-id :mime-version; bh=xKi3MRV0HuNdh8X12bmgk5Fgyd0OyO94+ZgbwiDLAGE=; b=C8QOyOxjyLQKh/HpJ6Vg+x0bSz1VxPhNCzB6vdW0Ts+Nb1Rk7fTXJuJglZIclzF89i qpEPn2zzH8phbFBShqg9vgxDjVo2EWOAu0nleFizTwNKfiaPxycfStkyLuiv63PhmRA3 HnvMLhYj2CSotx68Cq04cq7tvFzKOkwfJNNhM8S6HcK5f9jZRPJaeHK/2J8zQc1czucS P/oGl72jPAegytZldpspg5ZmQPh3jx4QgjbmovNiKglnkf+iChPNt2ypGDulCS/DO5zZ JVkC6T6c3SCWXk8kn8DIWP+zvsToOYI7s84YCzZXT9Kd6WfPyPZNQvIoCWCC3IBYvo5m ZrEQ== X-Gm-Message-State: APjAAAUXx2rb2xeNsmB2cbunLuDOC1bMOM17wDs6VweWgxcX2RDA4DnT LDNgqx+OozJRvZVvHihw7BJSF/08 X-Google-Smtp-Source: APXvYqyY71wo8r//PaIhfcXgv9htEwfd5XGUUffxc2SLpyYUKY51SqQSIrWCKat28tXUZX6TbX8mnw== X-Received: by 2002:a19:750b:: with SMTP id y11mr12817039lfe.6.1558123435447; Fri, 17 May 2019 13:03:55 -0700 (PDT) Received: from localhost (81-237-128-99-no2219.tbcn.telia.com. [81.237.128.99]) by smtp.gmail.com with ESMTPSA id p14sm1874698lfk.24.2019.05.17.13.03.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 May 2019 13:03:54 -0700 (PDT) User-agent: mu4e 1.0; emacs 26.1 From: Einar Largenius To: bug-guix@gnu.org Subject: guix won't download if locale is set to swedish Date: Fri, 17 May 2019 22:03:53 +0200 Message-ID: <878sv4j1au.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::135 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 17 May 2019 17:19:57 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Hello. I just downloaded guix and installed it. In my config I have this line: (locale "sv_SE.utf8") If I run 'guix pull' I get the error: guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sourceware.org/pub/libffi-3.2.1.tar.gz" The part in swedish means "file or directory does not exist". 'LANG= guix pull' works without issue. From debbugs-submit-bounces@debbugs.gnu.org Sat May 18 07:55:32 2019 Received: (at 35785) by debbugs.gnu.org; 18 May 2019 11:55:32 +0000 Received: from localhost ([127.0.0.1]:60078 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hRxw4-0003xh-BD for submit@debbugs.gnu.org; Sat, 18 May 2019 07:55:32 -0400 Received: from eggs.gnu.org ([209.51.188.92]:46376) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hRxw1-0003xS-1w for 35785@debbugs.gnu.org; Sat, 18 May 2019 07:55:30 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:45786) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hRxvv-00029X-RC; Sat, 18 May 2019 07:55:23 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=33898 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hRxvu-0001M1-DM; Sat, 18 May 2019 07:55:22 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Einar Largenius Subject: Re: bug#35785: guix won't download if locale is set to swedish References: <878sv4j1au.fsf@gmail.com> Date: Sat, 18 May 2019 13:55:20 +0200 In-Reply-To: <878sv4j1au.fsf@gmail.com> (Einar Largenius's message of "Fri, 17 May 2019 22:03:53 +0200") Message-ID: <87d0kgvuxj.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 35785 Cc: 35785@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello Einar, Einar Largenius skribis: > I just downloaded guix and installed it. In my config I have this line: > > (locale "sv_SE.utf8") > > If I run 'guix pull' I get the error: > > guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sou= rceware.org/pub/libffi-3.2.1.tar.gz" > > The part in swedish means "file or directory does not exist". Could you paste the complete output of =E2=80=98guix pull -v2=E2=80=99 when= running under that locale? Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Sun May 19 13:45:21 2019 Received: (at 35785) by debbugs.gnu.org; 19 May 2019 17:45:21 +0000 Received: from localhost ([127.0.0.1]:35486 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSPs9-0001NH-Hu for submit@debbugs.gnu.org; Sun, 19 May 2019 13:45:21 -0400 Received: from mail-lf1-f43.google.com ([209.85.167.43]:35100) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSPs7-0001Mz-DW for 35785@debbugs.gnu.org; Sun, 19 May 2019 13:45:19 -0400 Received: by mail-lf1-f43.google.com with SMTP id c17so8657192lfi.2 for <35785@debbugs.gnu.org>; Sun, 19 May 2019 10:45:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version:content-transfer-encoding; bh=PtrW24ZnZ4+n3tjNWbVeZvmNxn6qwUcyt7qAUcKHPWA=; b=uH5EWaI1sU5zyxdVflXYLUPmajcUYZv4vxZpyfdN0M3cr9eAe5KIZBgGV8h9+7pg7S ye0docuX8Z4iRtMnHG01z03s36hTpLck0Pe/FE1fukdspJvCX4N99oDL/sSGqmZlABAm KMkJdr4XsZoQg3KhLKRKCDDu3ItPUS4W4bAgmToEJWaoyiuiGJzVrtdXsaW0XVEogbqy BgFFygrcK2cAx0zm+lxdDKqyp59fC/T9I1qM3e2L5ya4FSQQjamOW15HyHDn1JyvRUbc CMj1WilxcBWSHE09VCKvw+rsFIEnHCiZP5Ng87UNGXEBmwCYBczWDnZVWKi0g7vxFi5s o1IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version:content-transfer-encoding; bh=PtrW24ZnZ4+n3tjNWbVeZvmNxn6qwUcyt7qAUcKHPWA=; b=UR1jYzETRsYYPddhULcE/ZPndhfRypVNtOOjRxLGhZHIRCRiCwsFCSculboxDbjB8K vij2pAltkPa8IxkW/3YIX4kobyIrH893yo1yVZboTxVd58F1GnJFuQtsBD/DABe9bQ+E c/9FwV4gB1aYp0Sm3H6q8HAWixb86iH1BcTPE2KbBuJmL9+I2TtsyDvgI+F55IKLicRI r2696UrDdmIBqu8umozbbYbnoNJssyB2NgkGQZzw57K1pC2fzaGklJF8UJ3eSdekXahL 4YPvrUGN8WjUWObiJgQD32wyCRCQyrpXWp+W5+6d4ScsGSXzoy0Za1+87GcLeGLN96/Q de4Q== X-Gm-Message-State: APjAAAWsHgklusvkIOiPL06tgyeQ1pGjfepNO/QnKJFVS+UteteGAyki QeZ8Q0z0jcZO11FDGa0ULe4= X-Google-Smtp-Source: APXvYqxTyXkyIkUmM6fqbBaU7cYzFSEg1uhFSUdhwDRfIysMrPtb7rEuarFYPSj2bAeT5GXTKPVNgA== X-Received: by 2002:ac2:4252:: with SMTP id m18mr273917lfl.100.1558287913259; Sun, 19 May 2019 10:45:13 -0700 (PDT) Received: from localhost (81-237-128-99-no2219.tbcn.telia.com. [81.237.128.99]) by smtp.gmail.com with ESMTPSA id b25sm3274020lji.50.2019.05.19.10.45.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 May 2019 10:45:12 -0700 (PDT) References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> User-agent: mu4e 1.0; emacs 26.1 From: Einar Largenius To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#35785: guix won't download if locale is set to swedish In-reply-to: <87d0kgvuxj.fsf@gnu.org> Date: Sun, 19 May 2019 19:45:11 +0200 Message-ID: <87tvdqgwyg.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 35785 Cc: 35785@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) > Could you paste the complete output of =E2=80=98guix pull -v2=E2=80=99 wh= en running > under that locale? Yes sorry. I have not setup email yet on that system so I need to manually transcribe any output. This should be the complete output: Updating channel 'guix' from Git repository at 'https://git.savannah.gn= u.org/git/guix.git'... Building from this channel: guix https://git.savannah.gnu.org/git/guix.git f5557bd guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sourc= eware.org/pub/libffi-3.2.1.tar.gz" From debbugs-submit-bounces@debbugs.gnu.org Mon May 20 04:20:47 2019 Received: (at 35785) by debbugs.gnu.org; 20 May 2019 08:20:47 +0000 Received: from localhost ([127.0.0.1]:36026 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSdXK-00013C-MI for submit@debbugs.gnu.org; Mon, 20 May 2019 04:20:46 -0400 Received: from eggs.gnu.org ([209.51.188.92]:58534) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSdXI-000130-Ue for 35785@debbugs.gnu.org; Mon, 20 May 2019 04:20:45 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:48777) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSdXD-00021X-E9; Mon, 20 May 2019 04:20:39 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=47722 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hSdXC-0006NQ-Vd; Mon, 20 May 2019 04:20:39 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Einar Largenius Subject: Re: bug#35785: guix won't download if locale is set to swedish References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 1 Prairial an 227 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 20 May 2019 10:20:37 +0200 In-Reply-To: <87tvdqgwyg.fsf@gmail.com> (Einar Largenius's message of "Sun, 19 May 2019 19:45:11 +0200") Message-ID: <87pnodwn8q.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 35785 Cc: 35785@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Einar Largenius skribis: >> Could you paste the complete output of =E2=80=98guix pull -v2=E2=80=99 w= hen running >> under that locale? > > Yes sorry. I have not setup email yet on that system so I need to > manually transcribe any output. This should be the complete output: > > Updating channel 'guix' from Git repository at 'https://git.savannah.= gnu.org/git/guix.git'... > Building from this channel: > guix https://git.savannah.gnu.org/git/guix.git f5557bd > guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sou= rceware.org/pub/libffi-3.2.1.tar.gz" I can reproduce it: --8<---------------cut here---------------start------------->8--- $ export GUIX_LOCPATH=3D$(guix build glibc-locales)/lib/locale $ LANGUAGE=3D LC_ALL=3Dsv_SE.utf8 guix pull -p foo Updating channel 'guix' from Git repository at 'https://git.savannah.gnu.or= g/git/guix.git'... Building from this channel: guix https://git.savannah.gnu.org/git/guix.git 0f469c1 guix pull: error: lstat: Filen eller katalogen finns inte: "ftp://sourcewar= e.org/pub/libffi/libffi-3.2.1.tar.gz" --8<---------------cut here---------------end--------------->8--- Super weird! Investigating=E2=80=A6 Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Mon May 20 05:14:15 2019 Received: (at 35785) by debbugs.gnu.org; 20 May 2019 09:14:15 +0000 Received: from localhost ([127.0.0.1]:36059 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSeN4-0002MM-Sg for submit@debbugs.gnu.org; Mon, 20 May 2019 05:14:15 -0400 Received: from eggs.gnu.org ([209.51.188.92]:40368) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSeN3-0002MA-P1 for 35785@debbugs.gnu.org; Mon, 20 May 2019 05:14:14 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:49476) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSeMy-0008QB-M4; Mon, 20 May 2019 05:14:08 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=47924 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hSeMx-0000Uz-6a; Mon, 20 May 2019 05:14:08 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Einar Largenius Subject: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 1 Prairial an 227 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 20 May 2019 11:14:04 +0200 In-Reply-To: <87tvdqgwyg.fsf@gmail.com> (Einar Largenius's message of "Sun, 19 May 2019 19:45:11 +0200") Message-ID: <87blzxwkrn.fsf_-_@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 35785 Cc: 35785@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi! So the guts of the problem is that Guile=E2=80=99s =E2=80=98string->uri=E2= =80=99 procedure behaves incorrectly under that locale: --8<---------------cut here---------------start------------->8--- $ export GUIX_LOCPATH=3D$(guix build glibc-locales)/lib/locale $ LANGUAGE=3D LC_ALL=3Dsv_SE.utf8 ./pre-inst-env guile GNU Guile 2.2.4 Copyright (C) 1995-2017 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> ,use(web uri) scheme@(guile-user)> (string->uri "ftp://sourceware.org/pub/libffi/libffi-3= .2.1.tar.gz") $1 =3D #f --8<---------------cut here---------------end--------------->8--- More specifically, =E2=80=98parse-authority=E2=80=99 is failing under that = locale, because of the =E2=80=9Cw=E2=80=9D: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourceware.org" (co= nst 'fail)) $5 =3D fail scheme@(guile-user)> ((@@ (web uri) parse-authority) "//sourcevare.org" (co= nst 'fail)) $6 =3D #f $7 =3D "sourcevare.org" $8 =3D #f --8<---------------cut here---------------end--------------->8--- We can boil it down to this example: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> ,use(ice-9 regex) scheme@(guile-user)> (string-match "[a-z]" "a") $10 =3D #("a" (0 . 1)) scheme@(guile-user)> (string-match "[a-z]" "w") $11 =3D #f --8<---------------cut here---------------end--------------->8--- In short, under the sv_SE.utf8 locale of glibc 2.28, =E2=80=9Cw=E2=80=9D is= not considered part of the =E2=80=98a-z=E2=80=99 interval. Indeed, =E2=80=98localedata/locales/sv_SE=E2=80=99 in glibc reads this: % The letter w is normally not present in the Swedish alphabet. It % exists in some names in Swedish and foreign words, but is accounted % for as a variant of 'v'. Words and names with 'w' are in Swedish % ordered alphabetically among the words and names with 'v'. If two % words or names are only to be distinguished by 'v' or % 'w', 'v' is % placed before 'w'. Using the =E2=80=9Clower=E2=80=9D regexp class instead of =E2=80=9C[a-z]=E2= =80=9D works: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> (string-match "[[:lower:]]" "w") $12 =3D #("w" (0 . 1)) --8<---------------cut here---------------end--------------->8--- However, it=E2=80=99s not clear to me whether the =E2=80=9Clower=E2=80=9D c= lass is supposed to be the same for all locales or if we=E2=80=99re just lucky: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html Thoughts? The workaround until we=E2=80=99ve fixed it is to use another locale, thoug= h you can still set =E2=80=9CLC_MESSAGES=3Dsv_SE.utf8=E2=80=9D or =E2=80=9CLANGUA= GE=3Dsv=E2=80=9D. Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Mon May 20 05:14:44 2019 Received: (at control) by debbugs.gnu.org; 20 May 2019 09:14:44 +0000 Received: from localhost ([127.0.0.1]:36062 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSeNY-0002NA-A4 for submit@debbugs.gnu.org; Mon, 20 May 2019 05:14:44 -0400 Received: from eggs.gnu.org ([209.51.188.92]:40451) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSeNX-0002My-8h for control@debbugs.gnu.org; Mon, 20 May 2019 05:14:43 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:49483) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSeNS-00008q-5j for control@debbugs.gnu.org; Mon, 20 May 2019 05:14:38 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=47926 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hSeNQ-00043A-F9 for control@debbugs.gnu.org; Mon, 20 May 2019 05:14:37 -0400 Date: Mon, 20 May 2019 11:14:35 +0200 Message-Id: <87a7fhwkqs.fsf@gnu.org> To: control@debbugs.gnu.org From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Subject: control message for bug #35785 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) retitle 35785 'string->uri' fails in sv_SE locale From debbugs-submit-bounces@debbugs.gnu.org Mon May 20 05:16:52 2019 Received: (at control) by debbugs.gnu.org; 20 May 2019 09:16:52 +0000 Received: from localhost ([127.0.0.1]:36067 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSePZ-0002RH-Ni for submit@debbugs.gnu.org; Mon, 20 May 2019 05:16:50 -0400 Received: from eggs.gnu.org ([209.51.188.92]:40885) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hSePY-0002R2-ME for control@debbugs.gnu.org; Mon, 20 May 2019 05:16:48 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:49508) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hSePT-00011r-JO for control@debbugs.gnu.org; Mon, 20 May 2019 05:16:43 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=47938 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hSePS-0007Xw-00 for control@debbugs.gnu.org; Mon, 20 May 2019 05:16:42 -0400 Date: Mon, 20 May 2019 11:16:40 +0200 Message-Id: <878sv1wknb.fsf@gnu.org> To: control@debbugs.gnu.org From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Subject: control message for bug #35785 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) severity 35785 important From debbugs-submit-bounces@debbugs.gnu.org Mon May 27 07:06:15 2019 Received: (at 35785) by debbugs.gnu.org; 27 May 2019 11:06:15 +0000 Received: from localhost ([127.0.0.1]:52794 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hVDSJ-0003uy-Fe for submit@debbugs.gnu.org; Mon, 27 May 2019 07:06:15 -0400 Received: from sender-of-o51.zoho.com ([135.84.80.216]:21237) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hVDSG-0003uo-4i for 35785@debbugs.gnu.org; Mon, 27 May 2019 07:06:13 -0400 ARC-Seal: i=1; a=rsa-sha256; t=1558955133; cv=none; d=zoho.com; s=zohoarc; b=ArUMvB61e92gbLADGzGxPDvG/hvl66geoga2t5/QoJmSPC8ibwa24W40J4e2o1vJp+lXP3G0eQAZU/8g6eNH7SdK7rpFwuOaW3Y+gIhP0Qvgs/J8pUluFgxp3nw7m+ni45/119N9pMx76JGm3q3KYKbRY6EdKsaUlWBsNrlGAvk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1558955133; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results; bh=c/Ey5iWwA2KW9d/PKiWkaxkSdVbziiDIlKJ6pdRPom0=; b=a881IXmUgOxcIeRke82wnagN8GZh5g8jW7pvYfDaLfTwAFsjrXFjDLEFUafhkHcKt+UhR68Mrxipj4XDMb6ODWltYZzsVEe3EMS9MmbQBmgHnMPSdNe6tQMyTY1Yc/QH/o9JNb7Ih2qW6BU1w9QE9QYAcsuGLh9q6uz8z6bXuuY= ARC-Authentication-Results: i=1; mx.zoho.com; dkim=pass header.i=elephly.net; spf=pass smtp.mailfrom=rekado@elephly.net; dmarc=pass header.from= header.from= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1558955133; s=zoho; d=elephly.net; i=rekado@elephly.net; h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding; l=1075; bh=c/Ey5iWwA2KW9d/PKiWkaxkSdVbziiDIlKJ6pdRPom0=; b=dOqV0HS+iMbi+zlKqj1Eatz8bKKKr37by+CL5Eovc0I5f7Axl0n0amXPVwtb98Vi RQcp8Fyt25vghGL3bLqVsY7qspenURmXFV2GGrg25GZaDj0K0c+DMQtE46/8pkFXngk +HVInDyrMw7bpXKHPOB90mO+30KywfDYaqwoVdaM= Received: from localhost (141.80.218.143 [141.80.218.143]) by mx.zohomail.com with SMTPS id 1558955132626396.03664953525583; Mon, 27 May 2019 04:05:32 -0700 (PDT) References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> User-agent: mu4e 1.2.0; emacs 26.2 From: Ricardo Wurmus To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= In-reply-to: <87blzxwkrn.fsf_-_@gnu.org> X-URL: https://elephly.net X-PGP-Key: https://elephly.net/rekado.pubkey X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC Date: Mon, 27 May 2019 13:05:29 +0200 Message-ID: <87ftp017k6.fsf@elephly.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-ZohoMailClient: External X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 35785 Cc: 35785@debbugs.gnu.org, Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Ludovic Court=C3=A8s writes: > Using the =E2=80=9Clower=E2=80=9D regexp class instead of =E2=80=9C[a-z]= =E2=80=9D works: > > --8<---------------cut here---------------start------------->8--- > scheme@(guile-user)> (string-match "[[:lower:]]" "w") > $12 =3D #("w" (0 . 1)) > --8<---------------cut here---------------end--------------->8--- > > However, it=E2=80=99s not clear to me whether the =E2=80=9Clower=E2=80=9D= class is supposed to > be the same for all locales or if we=E2=80=99re just lucky: > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html > > Thoughts? The lower class is much larger than [a-z]. If we only wanted to work around this particular problem we could explicitly spell out the range, which would be the same in all locales. (Obviously, that wouldn=E2=80=99t = be pretty.) But can=E2=80=99t URI parts contain more than those characters? To circumv= ent the question whether the lower class is locale dependent we could generate an explicit range from a charset. -- Ricardo From debbugs-submit-bounces@debbugs.gnu.org Mon May 27 09:39:12 2019 Received: (at 35785) by debbugs.gnu.org; 27 May 2019 13:39:12 +0000 Received: from localhost ([127.0.0.1]:52962 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hVFqJ-0003lw-Vv for submit@debbugs.gnu.org; Mon, 27 May 2019 09:39:12 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:57457) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hVFqI-0003lh-65 for 35785@debbugs.gnu.org; Mon, 27 May 2019 09:39:10 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 10FCF22237; Mon, 27 May 2019 09:39:05 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Mon, 27 May 2019 09:39:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; bh=t2mtb4fFBltItZuuHLQeZ05uil3xDWioE6/s20hd69c=; b=6Qyu7twU PrnXRK468waxW+paK9XlkJiB+QKhy/IRS8MpSsNWT0AoYaJbJhW8RVt6/LomtsSR QCWTUJe3HQdMIP6dPpcDCGWmyDz3smMAkUbQsn6DKqYaupVpjAy0gJEZnpJdsxR5 CcYyOE9qjl5qaT+q5/JMvuHG3X+IcKEkgWW7C9piZWpsCGtYWfLW2OQF1X73kTkW m/d4hME5oQLJxyMhAMmmriw/1Dhf5+65yusSyzKgwSgGPBBiE13TnB3rBOmO/ZD8 /vrz0MiEEfeA4IqmzPzCWdVklA9AtHXA+FJ5bU/EBTHKLTsn9IfMO/Hn5kM11hKk 0ttJIcNH1nrctw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduuddruddvvddgieekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufhffffkfgggtgfgsehtqhertddtreejnecuhfhrohhmpefvihhmohht hhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucffohhmrg hinhepohhpvghnghhrohhuphdrohhrghenucfkphepjeegrdduudeirddukeeirdeggeen ucfrrghrrghmpehmrghilhhfrhhomhepshgrmhhplhgvthesnhhghihrohdrtghomhenuc evlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA id 5211F80065; Mon, 27 May 2019 09:39:04 -0400 (EDT) From: Timothy Sample To: Ricardo Wurmus Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> Date: Mon, 27 May 2019 09:39:03 -0400 Message-ID: <875zpw6mq0.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 35785 Cc: 35785@debbugs.gnu.org, Ludovic =?utf-8?Q?Court=C3=A8s?= , Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hello, Ricardo Wurmus writes: > Ludovic Court=C3=A8s writes: > >> Using the =E2=80=9Clower=E2=80=9D regexp class instead of =E2=80=9C[a-z]= =E2=80=9D works: >> >> --8<---------------cut here---------------start------------->8--- >> scheme@(guile-user)> (string-match "[[:lower:]]" "w") >> $12 =3D #("w" (0 . 1)) >> --8<---------------cut here---------------end--------------->8--- >> >> However, it=E2=80=99s not clear to me whether the =E2=80=9Clower=E2=80= =9D class is supposed to >> be the same for all locales or if we=E2=80=99re just lucky: >> >> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html >> >> Thoughts? > > The lower class is much larger than [a-z]. If we only wanted to work > around this particular problem we could explicitly spell out the range, > which would be the same in all locales. (Obviously, that wouldn=E2=80=99= t be > pretty.) I think that explicitly spelling out the range is the right thing to do here. The POSIX spec says that character ranges work in the POSIX locale, but =E2=80=9Cin other locales, a range expression has unspecified behavior.=E2=80=9D > But can=E2=80=99t URI parts contain more than those characters? A quick reading of RFC 3986 suggests that the host part of a URI can be an IP address (version 4 or 6) or a registered name. It gives the following rules for registered names: reg-name =3D *( unreserved / pct-encoded / sub-delims ) unreserved =3D ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded =3D "%" HEXDIG HEXDIG sub-delims =3D "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=3D" Here, =E2=80=9CALPHA=E2=80=9D, =E2=80=9CDIGIT=E2=80=9D, and =E2=80=9CHEXDIG= =E2=80=9D are specified in RFC 2234, and are just the ASCII ranges you might expect (except for that =E2=80=9CHEXDIG=E2= =80=9D only allows uppercase letters). It looks like Guile is currently a little stricter than this, but pretty close (if you take the character ranges to mean ASCII ranges). > To circumvent > the question whether the lower class is locale dependent we could > generate an explicit range from a charset. I think this is the right approach. Using =E2=80=9C[:lower:]=E2=80=9D woul= d allow things outside of the RFC, like =E2=80=98=C3=A9=E2=80=99. Adding support f= or internationalized domain names using Punycode would be cool, but well outside the scope of this bug. :) -- Tim From debbugs-submit-bounces@debbugs.gnu.org Tue May 28 07:17:27 2019 Received: (at 35785) by debbugs.gnu.org; 28 May 2019 11:17:27 +0000 Received: from localhost ([127.0.0.1]:55179 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hVa6h-000854-1W for submit@debbugs.gnu.org; Tue, 28 May 2019 07:17:27 -0400 Received: from eggs.gnu.org ([209.51.188.92]:52132) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hVa6d-00084p-DL for 35785@debbugs.gnu.org; Tue, 28 May 2019 07:17:24 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:57294) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hVa6Y-0007mj-8H; Tue, 28 May 2019 07:17:18 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=36718 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hVa6X-0004Ac-Mj; Tue, 28 May 2019 07:17:18 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Timothy Sample Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> <875zpw6mq0.fsf@ngyro.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 9 Prairial an 227 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 28 May 2019 13:17:15 +0200 In-Reply-To: <875zpw6mq0.fsf@ngyro.com> (Timothy Sample's message of "Mon, 27 May 2019 09:39:03 -0400") Message-ID: <8736ky3k1w.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 35785 Cc: Ricardo Wurmus , 35785@debbugs.gnu.org, Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Timothy, Timothy Sample skribis: > A quick reading of RFC 3986 suggests that the host part of a URI can be > an IP address (version 4 or 6) or a registered name. It gives the > following rules for registered names: > > reg-name =3D *( unreserved / pct-encoded / sub-delims ) > unreserved =3D ALPHA / DIGIT / "-" / "." / "_" / "~" > pct-encoded =3D "%" HEXDIG HEXDIG > sub-delims =3D "!" / "$" / "&" / "'" / "(" / ")" > / "*" / "+" / "," / ";" / "=3D" > > Here, =E2=80=9CALPHA=E2=80=9D, =E2=80=9CDIGIT=E2=80=9D, and =E2=80=9CHEXD= IG=E2=80=9D are specified in RFC 2234, and are > just the ASCII ranges you might expect (except for that =E2=80=9CHEXDIG= =E2=80=9D only > allows uppercase letters). Do you think you could turn that into a patch for Guile? I=E2=80=99d happi= ly apply it. :-) It looks like both [[:alnum:]] & co. and ranges would be locale-dependent, so my understanding is that we=E2=80=99ll have to list al= l the characters explicitly, right? Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Sun Jun 02 20:39:34 2019 Received: (at 35785) by debbugs.gnu.org; 3 Jun 2019 00:39:34 +0000 Received: from localhost ([127.0.0.1]:41175 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hXb0c-00060q-BM for submit@debbugs.gnu.org; Sun, 02 Jun 2019 20:39:34 -0400 Received: from out1-smtp.messagingengine.com ([66.111.4.25]:41537) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hXb0V-00060Y-0O for 35785@debbugs.gnu.org; Sun, 02 Jun 2019 20:39:28 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id D875C21F32; Sun, 2 Jun 2019 20:39:17 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Sun, 02 Jun 2019 20:39:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=YqSE4v G9GBbOfa8HxotmqJzUlICEGoUbZyOGz68J6J4=; b=4k3dV9SgnSIiSgPJMcKRTj rxSO1k6lQ+mdc1IQPKfHGh+DI7nQvkY61TYsEzznnbD+DHivUq8p3nE5O3xOeOtE MCmhRBjcT29HI61iAIPzb6Rvk+8BKXM6Pnqu0RQ0W8l3EjEKT7lxd4MDXH1k9D9N qUg+OlLpApBEPlFSlpyItX2+VQzHuCdImYO9E8Azj5ZYFgFu9aGeMhmRiJGRpA4h jCLg/uO9FXJhraRl2NcuQWcKHgB3lA8nyX45AowSxvJR41LkbwZ0+iemNL8hPq13 a0evQWbabQfoeNNL6ZSa5oaxZWXA/LQ1vc93LpSrTZ++fiwtkeXzAWmOeUMQH7LA == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduuddrudefiedgfeeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufhfffgjkfgfgggtsehmtderredtreejnecuhfhrohhmpefvihhmohht hhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucffohhmrg hinhepshhouhhrtggvfigrrhgvrdhorhhgpdhgnhhurdhorhhgnecukfhppeejgedruddu iedrudekiedrgeegnecurfgrrhgrmhepmhgrihhlfhhrohhmpehsrghmphhlvghtsehngh ihrhhordgtohhmnecuvehluhhsthgvrhfuihiivgeptd X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA id CFF6E80064; Sun, 2 Jun 2019 20:39:16 -0400 (EDT) From: Timothy Sample To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> <875zpw6mq0.fsf@ngyro.com> <8736ky3k1w.fsf@gnu.org> Date: Sun, 02 Jun 2019 20:39:16 -0400 In-Reply-To: <8736ky3k1w.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 28 May 2019 13:17:15 +0200") Message-ID: <87imtnsdsb.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 35785 Cc: Ricardo Wurmus , 35785@debbugs.gnu.org, Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, Ludovic Court=C3=A8s writes: > Hi Timothy, > > Timothy Sample skribis: > >> A quick reading of RFC 3986 suggests that the host part of a URI can be >> an IP address (version 4 or 6) or a registered name. It gives the >> following rules for registered names: >> >> reg-name =3D *( unreserved / pct-encoded / sub-delims ) >> unreserved =3D ALPHA / DIGIT / "-" / "." / "_" / "~" >> pct-encoded =3D "%" HEXDIG HEXDIG >> sub-delims =3D "!" / "$" / "&" / "'" / "(" / ")" >> / "*" / "+" / "," / ";" / "=3D" >> >> Here, =E2=80=9CALPHA=E2=80=9D, =E2=80=9CDIGIT=E2=80=9D, and =E2=80=9CHEX= DIG=E2=80=9D are specified in RFC 2234, and are >> just the ASCII ranges you might expect (except for that =E2=80=9CHEXDIG= =E2=80=9D only >> allows uppercase letters). > > Do you think you could turn that into a patch for Guile? I=E2=80=99d hap= pily > apply it. :-) > > It looks like both [[:alnum:]] & co. and ranges would be > locale-dependent, so my understanding is that we=E2=80=99ll have to list = all the > characters explicitly, right? Here=E2=80=99s a patch for Guile that uses explicit lists of characters in = the =E2=80=98(web uri)=E2=80=99 module instead of character ranges. It include= s two tests that are pretty verbose, but seem to do the trick. I have a bit more background on the problem, mostly coming from a Glibc bug report: . It turns out that it is well-known upstream, and avoiding character ranges is the recommended approach for know. Some other GNU tools have adopted what is being called the =E2=80=9CRational Range Interpretation=E2= =80=9D . AIUI, this means they use the underlying encoding numbers for ranges (I checked the source, but I=E2=80=99m only mostly sure I read it right). It = looks like the Glibc folks are unsure how to proceed on this (but are maybe slightly leaning towards the =E2=80=9Crational=E2=80=9D approach). It=E2=80=99s all a pretty big mess, really. I was hoping there would be so= me obvious thing that would fix the problem more generally. Short of pulling in the Gnulib regex code or writing something in Scheme, it looks like Guile is stuck where it is now. I=E2=80=99m unsure if the changes are considered =E2=80=9Ctrivial=E2=80=9D = from a copyright perspective. It=E2=80=99s pretty close, but I think programmers tend to underestimate here. I=E2=80=99ve started the FSF copyright assignment proc= ess either way, since is likely not my last Guile patch. :) -- Tim --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: attachment; filename=0001-Make-URI-handling-locale-independent.patch Content-Transfer-Encoding: quoted-printable Content-Description: patch >From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001 From: Timothy Sample Date: Sun, 2 Jun 2019 14:41:20 -0400 Subject: [PATCH] Make URI handling locale independent. Fixes . * module/web/uri.scm (digits, hex-digits, letters): New variables. (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp, userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly list each character instead of using character ranges. * test-suite/tests/web-uri.test: Add corresponding tests. --- module/web/uri.scm | 31 +++++++++++++++++++++---------- test-suite/tests/web-uri.test | 29 ++++++++++++++++++++++++++--- 2 files changed, 47 insertions(+), 13 deletions(-) diff --git a/module/web/uri.scm b/module/web/uri.scm index 4c6fa5051..b4b89b9cc 100644 --- a/module/web/uri.scm +++ b/module/web/uri.scm @@ -1,6 +1,6 @@ ;;;; (web uri) --- URI manipulation tools ;;;; -;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014 Free Software F= oundation, Inc. +;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019 Free Softw= are Foundation, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -175,17 +175,28 @@ for =E2=80=98build-uri=E2=80=99 except there is no sc= heme." ;;; Converters. ;;; =20 +;; Since character ranges in regular expressions may depend on the +;; current locale, we use explicit lists of characters instead. See +;; for details. +(define digits "0123456789") +(define hex-digits "0123456789ABCDEFabcdef") +(define letters "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz") + ;; See RFC 3986 #3.2.2 for comments on percent-encodings, IDNA (RFC ;; 3490), and non-ASCII host names. ;; (define ipv4-regexp - (make-regexp "^([0-9.]+)$")) + (make-regexp (string-append "^([" digits ".]+)$"))) (define ipv6-regexp - (make-regexp "^([0-9a-fA-F:.]+)$")) + (make-regexp (string-append "^([" hex-digits ":.]+)$"))) (define domain-label-regexp - (make-regexp "^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?$")) + (make-regexp + (string-append "^[" letters digits "]" + "([" letters digits "-]*[" letters digits "])?$"))) (define top-label-regexp - (make-regexp "^[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?$")) + (make-regexp + (string-append "^[" letters "]" + "([" letters digits "-]*[" letters digits "])?$"))) =20 (define (valid-host? host) (cond @@ -203,13 +214,13 @@ for =E2=80=98build-uri=E2=80=99 except there is no sc= heme." (regexp-exec top-label-regexp host start))))))) =20 (define userinfo-pat - "[a-zA-Z0-9_.!~*'();:&=3D+$,-]+") + (string-append "[" letters digits "_.!~*'();:&=3D+$,-]+")) (define host-pat - "[a-zA-Z0-9.-]+") + (string-append "[" letters digits ".-]+")) (define ipv6-host-pat - "[0-9a-fA-F:.]+") + (string-append "[" hex-digits ":.]+")) (define port-pat - "[0-9]*") + (string-append "[" digits "]*")) (define authority-regexp (make-regexp (format #f "^//((~a)@)?((~a)|(\\[(~a)\\]))(:(~a))?$" @@ -246,7 +257,7 @@ for =E2=80=98build-uri=E2=80=99 except there is no sche= me." ;;; either. =20 (define scheme-pat - "[a-zA-Z][a-zA-Z0-9+.-]*") + (string-append "[" letters "][" letters digits "+.-]*")) (define authority-pat "[^/?#]*") (define path-pat diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test index 73391898c..ef8e85eba 100644 --- a/test-suite/tests/web-uri.test +++ b/test-suite/tests/web-uri.test @@ -1,6 +1,6 @@ ;;;; web-uri.test --- URI library -*- mode: scheme; coding: utf-8= ; -*- ;;;; -;;;; Copyright (C) 2010-2012, 2014, 2017 Free Software Foundation, Inc. +;;;; Copyright (C) 2010-2012, 2014, 2017, 2019 Free Software Foundation, = Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -121,7 +121,18 @@ =20 (pass-if-uri-exception "http://foo@" "Expected.*host" - (build-uri 'http #:userinfo "foo"))) + (build-uri 'http #:userinfo "foo")) + + (pass-if-uri-exception "http://ill=C3=A9gal.com" + "Expected.*host" + (dynamic-wind + (lambda () #t) + (lambda () + (with-locale "en_US.utf8" + (reload-module (resolve-module '(web uri))) + (build-uri 'http #:host "ill=C3=A9gal.com")= )) + (lambda () + (reload-module (resolve-module '(web uri)))))= )) =20 (with-test-prefix "build-uri-reference" (pass-if "//host/etc/foo" @@ -290,7 +301,19 @@ #:port 100 #:path "/" #:query "q" - #:fragment "bar"))) + #:fragment "bar")) + + ;; bug #35785 + (pass-if "http://www.example.com (sv_SE)" + (dynamic-wind + (lambda () #t) + (lambda () + (with-locale "sv_SE.utf8" + (reload-module (resolve-module '(web uri))) + (uri=3D? (string->uri "http://www.example.com") + #:scheme 'http #:host "www.example.com" #:path ""))) + (lambda () + (reload-module (resolve-module '(web uri))))))) =20 (with-test-prefix "string->uri-reference" (pass-if "/foo" --=20 2.21.0 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Mon Jun 03 09:02:02 2019 Received: (at 35785) by debbugs.gnu.org; 3 Jun 2019 13:02:02 +0000 Received: from localhost ([127.0.0.1]:41824 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hXmbC-0001yL-3O for submit@debbugs.gnu.org; Mon, 03 Jun 2019 09:02:02 -0400 Received: from eggs.gnu.org ([209.51.188.92]:56300) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hXmb8-0001xq-Hn for 35785@debbugs.gnu.org; Mon, 03 Jun 2019 09:02:00 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:43887) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hXmb2-0006Xq-UD; Mon, 03 Jun 2019 09:01:53 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=57154 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hXmay-0006CX-L9; Mon, 03 Jun 2019 09:01:51 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Timothy Sample Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> <875zpw6mq0.fsf@ngyro.com> <8736ky3k1w.fsf@gnu.org> <87imtnsdsb.fsf@ngyro.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 15 Prairial an 227 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 03 Jun 2019 15:01:45 +0200 In-Reply-To: <87imtnsdsb.fsf@ngyro.com> (Timothy Sample's message of "Sun, 02 Jun 2019 20:39:16 -0400") Message-ID: <871s0ahlfq.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 35785 Cc: Ricardo Wurmus , 35785@debbugs.gnu.org, Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Timothy, Timothy Sample skribis: > Here=E2=80=99s a patch for Guile that uses explicit lists of characters i= n the > =E2=80=98(web uri)=E2=80=99 module instead of character ranges. It inclu= des two tests > that are pretty verbose, but seem to do the trick. > > I have a bit more background on the problem, mostly coming from a Glibc > bug report: . > > It turns out that it is well-known upstream, and avoiding character > ranges is the recommended approach for know. Some other GNU tools have > adopted what is being called the =E2=80=9CRational Range Interpretation= =E2=80=9D > . > AIUI, this means they use the underlying encoding numbers for ranges (I > checked the source, but I=E2=80=99m only mostly sure I read it right). I= t looks > like the Glibc folks are unsure how to proceed on this (but are maybe > slightly leaning towards the =E2=80=9Crational=E2=80=9D approach). Great that you gleaned good references on this topic! > It=E2=80=99s all a pretty big mess, really. I was hoping there would be = some > obvious thing that would fix the problem more generally. Short of > pulling in the Gnulib regex code or writing something in Scheme, it > looks like Guile is stuck where it is now. Yeah. The alternative would be to not use regexps in this context, I guess. > I=E2=80=99m unsure if the changes are considered =E2=80=9Ctrivial=E2=80= =9D from a copyright > perspective. It=E2=80=99s pretty close, but I think programmers tend to > underestimate here. I=E2=80=99ve started the FSF copyright assignment pr= ocess > either way, since is likely not my last Guile patch. :) If the process is already underway, I think it=E2=80=99s fine to commit this patch (I would rather wait if it were longer and/or if we didn=E2=80=99t kn= ow each other already). > From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001 > From: Timothy Sample > Date: Sun, 2 Jun 2019 14:41:20 -0400 > Subject: [PATCH] Make URI handling locale independent. > > Fixes . > > * module/web/uri.scm (digits, hex-digits, letters): New variables. > (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp, > userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly > list each character instead of using character ranges. > * test-suite/tests/web-uri.test: Add corresponding tests. [...] > + (pass-if "http://www.example.com (sv_SE)" > + (dynamic-wind > + (lambda () #t) > + (lambda () > + (with-locale "sv_SE.utf8" > + (reload-module (resolve-module '(web uri))) > + (uri=3D? (string->uri "http://www.example.com") > + #:scheme 'http #:host "www.example.com" #:path ""))) Aren=E2=80=99t =E2=80=98reload-module=E2=80=99 calls a leftover that can no= w be removed (also in the other test)? For the sv_SE test, what about taking a host name with a =E2=80=98w=E2=80= =99, since that=E2=80=99s the use case that allowed us to uncover this bug? Apart from that it LGTM, thank you! Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Mon Jun 03 10:24:51 2019 Received: (at 35785) by debbugs.gnu.org; 3 Jun 2019 14:24:51 +0000 Received: from localhost ([127.0.0.1]:42633 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hXntL-0008NH-3i for submit@debbugs.gnu.org; Mon, 03 Jun 2019 10:24:51 -0400 Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:50507) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hXntI-0008N3-Jg for 35785@debbugs.gnu.org; Mon, 03 Jun 2019 10:24:49 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id A4B9A717; Mon, 3 Jun 2019 10:24:42 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Mon, 03 Jun 2019 10:24:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=L67OdpXouFWKPKyLW93xQj+mLcu6HFv/Fvz1rfeNj 6I=; b=4wxpReaq7oHmtUQA+DNvAwSLwOlLOYYHZgjFH54s/4dqyA/LRiKU8zDAu RIDk3+BCA+Wpzp9dyNFRQ1q8cpJEzgvXyn15ZyhNV9icgXKSCOp9lw0qfjat6iAt WZEAEqnFH8YKnLJziBVsWXpEbvsYII2Uu+A4T3QIbzWyO403+c8dqpgenVDj7zN7 2VXEHkNewg2rt6g9QQ8OhintLS/7OIOWAYRCLw4H7+oDt4D9RXi6RZ2sljgylJze 8W4ymro5Y/9wM5HuSJiDs7lHanlAtOma+kQk+EVvA3DbWqEXkHi5fIKDSu/LCe8f vsubfs6YazfwOXqDQ4oniVPC3qkwQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduuddrudefjedgjeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufhfffgjkfgfgggtgfesthhqredttderjeenucfhrhhomhepvfhimhho thhhhicuufgrmhhplhgvuceoshgrmhhplhgvthesnhhghihrohdrtghomheqnecuffhomh grihhnpehsohhurhgtvgifrghrvgdrohhrghdpvgigrghmphhlvgdrtghomhdpghhnuhdr ohhrghenucfkphepjeegrdduudeirddukeeirdeggeenucfrrghrrghmpehmrghilhhfrh homhepshgrmhhplhgvthesnhhghihrohdrtghomhenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA id 88BC08006B; Mon, 3 Jun 2019 10:24:41 -0400 (EDT) From: Timothy Sample To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> <875zpw6mq0.fsf@ngyro.com> <8736ky3k1w.fsf@gnu.org> <87imtnsdsb.fsf@ngyro.com> <871s0ahlfq.fsf@gnu.org> Date: Mon, 03 Jun 2019 10:24:40 -0400 In-Reply-To: <871s0ahlfq.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 03 Jun 2019 15:01:45 +0200") Message-ID: <87ef4asq53.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 35785 Cc: Ricardo Wurmus , 35785@debbugs.gnu.org, Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hi Ludo, Ludovic Court=C3=A8s writes: > Hi Timothy, > > Timothy Sample skribis: > >> Here=E2=80=99s a patch for Guile that uses explicit lists of characters = in the >> =E2=80=98(web uri)=E2=80=99 module instead of character ranges. It incl= udes two tests >> that are pretty verbose, but seem to do the trick. >> >> I have a bit more background on the problem, mostly coming from a Glibc >> bug report: . >> >> It turns out that it is well-known upstream, and avoiding character >> ranges is the recommended approach for know. Some other GNU tools have >> adopted what is being called the =E2=80=9CRational Range Interpretation= =E2=80=9D >> . >> AIUI, this means they use the underlying encoding numbers for ranges (I >> checked the source, but I=E2=80=99m only mostly sure I read it right). = It looks >> like the Glibc folks are unsure how to proceed on this (but are maybe >> slightly leaning towards the =E2=80=9Crational=E2=80=9D approach). > > Great that you gleaned good references on this topic! > >> It=E2=80=99s all a pretty big mess, really. I was hoping there would be= some >> obvious thing that would fix the problem more generally. Short of >> pulling in the Gnulib regex code or writing something in Scheme, it >> looks like Guile is stuck where it is now. > > Yeah. The alternative would be to not use regexps in this context, I > guess. I meant fixing regexes in other contexts, since I=E2=80=99m sure the URI mo= dule is not the only Guile code ever that assumed =E2=80=9C[a-z]=E2=80=9D would = only match ASCII lowercase letters. >> I=E2=80=99m unsure if the changes are considered =E2=80=9Ctrivial=E2=80= =9D from a copyright >> perspective. It=E2=80=99s pretty close, but I think programmers tend to >> underestimate here. I=E2=80=99ve started the FSF copyright assignment p= rocess >> either way, since is likely not my last Guile patch. :) > > If the process is already underway, I think it=E2=80=99s fine to commit t= his > patch (I would rather wait if it were longer and/or if we didn=E2=80=99t = know > each other already). Sounds good! >> From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001 >> From: Timothy Sample >> Date: Sun, 2 Jun 2019 14:41:20 -0400 >> Subject: [PATCH] Make URI handling locale independent. >> >> Fixes . >> >> * module/web/uri.scm (digits, hex-digits, letters): New variables. >> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp, >> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly >> list each character instead of using character ranges. >> * test-suite/tests/web-uri.test: Add corresponding tests. > > [...] > >> + (pass-if "http://www.example.com (sv_SE)" >> + (dynamic-wind >> + (lambda () #t) >> + (lambda () >> + (with-locale "sv_SE.utf8" >> + (reload-module (resolve-module '(web uri))) >> + (uri=3D? (string->uri "http://www.example.com") >> + #:scheme 'http #:host "www.example.com" #:path ""))) > > Aren=E2=80=99t =E2=80=98reload-module=E2=80=99 calls a leftover that can = now be removed (also in > the other test)? I needed to reload the modules like that to make the tests fail without the patch and pass with it. My understanding is that the bug happens at regex compile time, which happens when the module is loaded. If I don=E2=80=99t reload the module, the old URI code passes the tests, since t= he regexes were compiled with a locale that does not trigger the bug. It=E2= =80=99s a little wacky, sure, but it was the best idea I could come up with. > For the sv_SE test, what about taking a host name with a =E2=80=98w=E2=80= =99, since > that=E2=80=99s the use case that allowed us to uncover this bug? I thought I was being clever by using a =E2=80=9Cwww=E2=80=9D hostname, but= apparently it=E2=80=99s so normalized as to be invisible! Feel free to change it to something more obvious like =E2=80=9Cw.com=E2=80=9D or whatever. -- Tim From debbugs-submit-bounces@debbugs.gnu.org Tue Jun 04 03:43:10 2019 Received: (at 35785) by debbugs.gnu.org; 4 Jun 2019 07:43:11 +0000 Received: from localhost ([127.0.0.1]:43788 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hY46A-0007hL-J8 for submit@debbugs.gnu.org; Tue, 04 Jun 2019 03:43:10 -0400 Received: from eggs.gnu.org ([209.51.188.92]:34075) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hY468-0007h7-Eg for 35785@debbugs.gnu.org; Tue, 04 Jun 2019 03:43:08 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:60135) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hY45x-0002ms-KI; Tue, 04 Jun 2019 03:42:58 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=57410 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hY45x-0004wV-40; Tue, 04 Jun 2019 03:42:57 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Timothy Sample Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> <875zpw6mq0.fsf@ngyro.com> <8736ky3k1w.fsf@gnu.org> <87imtnsdsb.fsf@ngyro.com> <871s0ahlfq.fsf@gnu.org> <87ef4asq53.fsf@ngyro.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 16 Prairial an 227 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 04 Jun 2019 09:42:55 +0200 In-Reply-To: <87ef4asq53.fsf@ngyro.com> (Timothy Sample's message of "Mon, 03 Jun 2019 10:24:40 -0400") Message-ID: <87imtlhk3k.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 35785 Cc: Ricardo Wurmus , 35785@debbugs.gnu.org, Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello, Timothy Sample skribis: >>> From 7b02be4c050c7b17a0e2685e8e453295f798c360 Mon Sep 17 00:00:00 2001 >>> From: Timothy Sample >>> Date: Sun, 2 Jun 2019 14:41:20 -0400 >>> Subject: [PATCH] Make URI handling locale independent. >>> >>> Fixes . >>> >>> * module/web/uri.scm (digits, hex-digits, letters): New variables. >>> (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp, >>> userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly >>> list each character instead of using character ranges. >>> * test-suite/tests/web-uri.test: Add corresponding tests. >> >> [...] >> >>> + (pass-if "http://www.example.com (sv_SE)" >>> + (dynamic-wind >>> + (lambda () #t) >>> + (lambda () >>> + (with-locale "sv_SE.utf8" >>> + (reload-module (resolve-module '(web uri))) >>> + (uri=3D? (string->uri "http://www.example.com") >>> + #:scheme 'http #:host "www.example.com" #:path ""))) >> >> Aren=E2=80=99t =E2=80=98reload-module=E2=80=99 calls a leftover that can= now be removed (also in >> the other test)? > > I needed to reload the modules like that to make the tests fail without > the patch and pass with it. My understanding is that the bug happens > at regex compile time, which happens when the module is loaded. If I > don=E2=80=99t reload the module, the old URI code passes the tests, since= the > regexes were compiled with a locale that does not trigger the bug. It=E2= =80=99s > a little wacky, sure, but it was the best idea I could come up with. Oooh, I see. Could you add a comment to explain this? Then we=E2=80=99re = done. >> For the sv_SE test, what about taking a host name with a =E2=80=98w=E2= =80=99, since >> that=E2=80=99s the use case that allowed us to uncover this bug? > > I thought I was being clever by using a =E2=80=9Cwww=E2=80=9D hostname, b= ut apparently > it=E2=80=99s so normalized as to be invisible! Feel free to change it to > something more obvious like =E2=80=9Cw.com=E2=80=9D or whatever. Silly me, I guess I need new glasses. :-) Thanks! Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Tue Jun 04 09:56:59 2019 Received: (at 35785) by debbugs.gnu.org; 4 Jun 2019 13:56:59 +0000 Received: from localhost ([127.0.0.1]:45508 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hY9vq-0000j2-4C for submit@debbugs.gnu.org; Tue, 04 Jun 2019 09:56:58 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:47485) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hY9vk-0000iu-TX for 35785@debbugs.gnu.org; Tue, 04 Jun 2019 09:56:52 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id B43612134B; Tue, 4 Jun 2019 09:56:43 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Tue, 04 Jun 2019 09:56:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=CR2XWp tf+7QX9v5tye/jTkBJgNb/vZ+cE8jTD+q3Giw=; b=wdAe/0jhvIW2fF96UDupKP 09eCftFy18WrAbeWTqzJuIpKB2oP1IL7dzdKrp8wJCkU+jX0Smh3E5yaZXxT78B4 WaNQcnd1mMhbrXn1QHAit43NrLcfosPsgS1ZD3XZLL86JVfIckTDHo2OntsUDjvV O4euxVvjkBeQnmNvSETtt6B44cPPJqiHp/sDukiDd98dsP59ZUaR/KAefCvkqDyQ 9x3Y+dO8/Ko9BdWtuRZUsmXv9hSZKiteT4g1tj8Myi2TiuKkBVX8sRV6cRkcvUi0 gSHA8n4Fg2L3eZv2iXDzx9AupUb8y7wrrJs+vxSVGc+ahIEKreDPkfnZ8ZvQdx2g == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduuddrudefledgjeduucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufhfffgjkfgfgggtsehmtderredtreejnecuhfhrohhmpefvihhmohht hhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucfkphepje egrdduudeirddukeeirdeggeenucfrrghrrghmpehmrghilhhfrhhomhepshgrmhhplhgv thesnhhghihrohdrtghomhenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA id 08CEC8005A; Tue, 4 Jun 2019 09:56:39 -0400 (EDT) From: Timothy Sample To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> <875zpw6mq0.fsf@ngyro.com> <8736ky3k1w.fsf@gnu.org> <87imtnsdsb.fsf@ngyro.com> <871s0ahlfq.fsf@gnu.org> <87ef4asq53.fsf@ngyro.com> <87imtlhk3k.fsf@gnu.org> Date: Tue, 04 Jun 2019 09:56:39 -0400 In-Reply-To: <87imtlhk3k.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 04 Jun 2019 09:42:55 +0200") Message-ID: <87sgsp8ne0.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 35785 Cc: Ricardo Wurmus , 35785@debbugs.gnu.org, Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, Ludovic Court=C3=A8s writes: > Timothy Sample skribis: > > [...] > >> I needed to reload the modules like that to make the tests fail without >> the patch and pass with it. My understanding is that the bug happens >> at regex compile time, which happens when the module is loaded. If I >> don=E2=80=99t reload the module, the old URI code passes the tests, sinc= e the >> regexes were compiled with a locale that does not trigger the bug. It= =E2=80=99s >> a little wacky, sure, but it was the best idea I could come up with. > > Oooh, I see. Could you add a comment to explain this? Then we=E2=80=99r= e done. Here it is! I hope it is clear. -- Tim --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: attachment; filename=0001-Make-URI-handling-locale-independent.patch Content-Transfer-Encoding: quoted-printable Content-Description: patch >From 9ac8643e5315d4baaddb93ee246ba8db0b3448ab Mon Sep 17 00:00:00 2001 From: Timothy Sample Date: Sun, 2 Jun 2019 14:41:20 -0400 Subject: [PATCH] Make URI handling locale independent. Fixes . * module/web/uri.scm (digits, hex-digits, letters): New variables. (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp, userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly list each character instead of using character ranges. * test-suite/tests/web-uri.test: Add corresponding tests. --- module/web/uri.scm | 31 +++++++++++++++++++++---------- test-suite/tests/web-uri.test | 33 ++++++++++++++++++++++++++++++--- 2 files changed, 51 insertions(+), 13 deletions(-) diff --git a/module/web/uri.scm b/module/web/uri.scm index 4c6fa5051..b4b89b9cc 100644 --- a/module/web/uri.scm +++ b/module/web/uri.scm @@ -1,6 +1,6 @@ ;;;; (web uri) --- URI manipulation tools ;;;; -;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014 Free Software F= oundation, Inc. +;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019 Free Softw= are Foundation, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -175,17 +175,28 @@ for =E2=80=98build-uri=E2=80=99 except there is no sc= heme." ;;; Converters. ;;; =20 +;; Since character ranges in regular expressions may depend on the +;; current locale, we use explicit lists of characters instead. See +;; for details. +(define digits "0123456789") +(define hex-digits "0123456789ABCDEFabcdef") +(define letters "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz") + ;; See RFC 3986 #3.2.2 for comments on percent-encodings, IDNA (RFC ;; 3490), and non-ASCII host names. ;; (define ipv4-regexp - (make-regexp "^([0-9.]+)$")) + (make-regexp (string-append "^([" digits ".]+)$"))) (define ipv6-regexp - (make-regexp "^([0-9a-fA-F:.]+)$")) + (make-regexp (string-append "^([" hex-digits ":.]+)$"))) (define domain-label-regexp - (make-regexp "^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?$")) + (make-regexp + (string-append "^[" letters digits "]" + "([" letters digits "-]*[" letters digits "])?$"))) (define top-label-regexp - (make-regexp "^[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?$")) + (make-regexp + (string-append "^[" letters "]" + "([" letters digits "-]*[" letters digits "])?$"))) =20 (define (valid-host? host) (cond @@ -203,13 +214,13 @@ for =E2=80=98build-uri=E2=80=99 except there is no sc= heme." (regexp-exec top-label-regexp host start))))))) =20 (define userinfo-pat - "[a-zA-Z0-9_.!~*'();:&=3D+$,-]+") + (string-append "[" letters digits "_.!~*'();:&=3D+$,-]+")) (define host-pat - "[a-zA-Z0-9.-]+") + (string-append "[" letters digits ".-]+")) (define ipv6-host-pat - "[0-9a-fA-F:.]+") + (string-append "[" hex-digits ":.]+")) (define port-pat - "[0-9]*") + (string-append "[" digits "]*")) (define authority-regexp (make-regexp (format #f "^//((~a)@)?((~a)|(\\[(~a)\\]))(:(~a))?$" @@ -246,7 +257,7 @@ for =E2=80=98build-uri=E2=80=99 except there is no sche= me." ;;; either. =20 (define scheme-pat - "[a-zA-Z][a-zA-Z0-9+.-]*") + (string-append "[" letters "][" letters digits "+.-]*")) (define authority-pat "[^/?#]*") (define path-pat diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test index 73391898c..94778acac 100644 --- a/test-suite/tests/web-uri.test +++ b/test-suite/tests/web-uri.test @@ -1,6 +1,6 @@ ;;;; web-uri.test --- URI library -*- mode: scheme; coding: utf-8= ; -*- ;;;; -;;;; Copyright (C) 2010-2012, 2014, 2017 Free Software Foundation, Inc. +;;;; Copyright (C) 2010-2012, 2014, 2017, 2019 Free Software Foundation, = Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -121,7 +121,21 @@ =20 (pass-if-uri-exception "http://foo@" "Expected.*host" - (build-uri 'http #:userinfo "foo"))) + (build-uri 'http #:userinfo "foo")) + + ;; In this test, we need to reload the '(web uri)' module with a + ;; different locale. This is because some locale-dependent things + ;; (e.g., compiled regexes) are computed when the module is loaded. + (pass-if-uri-exception "http://ill=C3=A9gal.com" + "Expected.*host" + (dynamic-wind + (lambda () #t) + (lambda () + (with-locale "en_US.utf8" + (reload-module (resolve-module '(web uri))) + (build-uri 'http #:host "ill=C3=A9gal.com")= )) + (lambda () + (reload-module (resolve-module '(web uri)))))= )) =20 (with-test-prefix "build-uri-reference" (pass-if "//host/etc/foo" @@ -290,7 +304,20 @@ #:port 100 #:path "/" #:query "q" - #:fragment "bar"))) + #:fragment "bar")) + + ;; This test reproduces bug #35785. See the 'ill=C3=A9gal' test above f= or + ;; why we reload the module. + (pass-if "http://www.example.com (sv_SE)" + (dynamic-wind + (lambda () #t) + (lambda () + (with-locale "sv_SE.utf8" + (reload-module (resolve-module '(web uri))) + (uri=3D? (string->uri "http://www.example.com") + #:scheme 'http #:host "www.example.com" #:path ""))) + (lambda () + (reload-module (resolve-module '(web uri))))))) =20 (with-test-prefix "string->uri-reference" (pass-if "/foo" --=20 2.21.0 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Tue Jun 04 15:23:46 2019 Received: (at control) by debbugs.gnu.org; 4 Jun 2019 19:23:46 +0000 Received: from localhost ([127.0.0.1]:45807 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hYF27-0001EF-8P for submit@debbugs.gnu.org; Tue, 04 Jun 2019 15:23:43 -0400 Received: from eggs.gnu.org ([209.51.188.92]:34473) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hYF25-0001Dw-Hx for control@debbugs.gnu.org; Tue, 04 Jun 2019 15:23:42 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:40925) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYF20-0006aV-51 for control@debbugs.gnu.org; Tue, 04 Jun 2019 15:23:36 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=53384 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hYF1z-00049p-G7 for control@debbugs.gnu.org; Tue, 04 Jun 2019 15:23:35 -0400 Date: Tue, 04 Jun 2019 21:23:32 +0200 Message-Id: <87imtl8897.fsf@gnu.org> To: control@debbugs.gnu.org From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Subject: control message for bug #35785 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) reassign 35785 guile quit From debbugs-submit-bounces@debbugs.gnu.org Tue Jun 04 15:26:36 2019 Received: (at 35785-done) by debbugs.gnu.org; 4 Jun 2019 19:26:36 +0000 Received: from localhost ([127.0.0.1]:45814 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hYF4u-0001Jm-EB for submit@debbugs.gnu.org; Tue, 04 Jun 2019 15:26:36 -0400 Received: from eggs.gnu.org ([209.51.188.92]:34874) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1hYF4t-0001Ja-8E for 35785-done@debbugs.gnu.org; Tue, 04 Jun 2019 15:26:35 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:40953) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1hYF4n-0001N4-Gd; Tue, 04 Jun 2019 15:26:29 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=53390 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1hYF4l-0004NZ-A0; Tue, 04 Jun 2019 15:26:28 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Timothy Sample Subject: Re: bug#35785: =?utf-8?Q?=E2=80=98string-=3Euri=E2=80=99?= is locale-dependent and breaks in =?utf-8?B?4oCYc3ZfU0XigJk=?= References: <878sv4j1au.fsf@gmail.com> <87d0kgvuxj.fsf@gnu.org> <87tvdqgwyg.fsf@gmail.com> <87blzxwkrn.fsf_-_@gnu.org> <87ftp017k6.fsf@elephly.net> <875zpw6mq0.fsf@ngyro.com> <8736ky3k1w.fsf@gnu.org> <87imtnsdsb.fsf@ngyro.com> <871s0ahlfq.fsf@gnu.org> <87ef4asq53.fsf@ngyro.com> <87imtlhk3k.fsf@gnu.org> <87sgsp8ne0.fsf@ngyro.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 16 Prairial an 227 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 04 Jun 2019 21:26:25 +0200 In-Reply-To: <87sgsp8ne0.fsf@ngyro.com> (Timothy Sample's message of "Tue, 04 Jun 2019 09:56:39 -0400") Message-ID: <87d0jt884e.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 35785-done Cc: Ricardo Wurmus , 35785-done@debbugs.gnu.org, Einar Largenius X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi! Timothy Sample skribis: > From 9ac8643e5315d4baaddb93ee246ba8db0b3448ab Mon Sep 17 00:00:00 2001 > From: Timothy Sample > Date: Sun, 2 Jun 2019 14:41:20 -0400 > Subject: [PATCH] Make URI handling locale independent. > > Fixes . > > * module/web/uri.scm (digits, hex-digits, letters): New variables. > (ipv4-regexp, ipv6-regexp, domain-label-regexp, top-label-regexp, > userinfo-pat, host-pat, ipv6-host-pat, port-pat, scheme-pat): Explicitly > list each character instead of using character ranges. > * test-suite/tests/web-uri.test: Add corresponding tests. Perfect; pushed to the =E2=80=98stable-2.2=E2=80=99 branch as 420c2632bb1f48e492a035c1d216f209734f45e6. We got a notification from the FSF that they received your copyright assignment request too, so everything is on track. Thank you! Ludo=E2=80=99. From unknown Thu Sep 11 06:07:34 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 03 Jul 2019 11:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator