From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 14 20:28:14 2018 Received: (at submit) by debbugs.gnu.org; 15 Jan 2018 01:28:14 +0000 Received: from localhost ([127.0.0.1]:56489 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eatZO-0000bD-Aq for submit@debbugs.gnu.org; Sun, 14 Jan 2018 20:28:14 -0500 Received: from eggs.gnu.org ([208.118.235.92]:47860) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eatZM-0000ay-1L for submit@debbugs.gnu.org; Sun, 14 Jan 2018 20:28:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eatZF-0002DY-HF for submit@debbugs.gnu.org; Sun, 14 Jan 2018 20:28:06 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42150) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eatZF-0002DR-DO for submit@debbugs.gnu.org; Sun, 14 Jan 2018 20:28:05 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38829) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eatZD-0004NB-Qx for bug-guix@gnu.org; Sun, 14 Jan 2018 20:28:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eatZA-0002BW-MA for bug-guix@gnu.org; Sun, 14 Jan 2018 20:28:03 -0500 Received: from mail-it0-x236.google.com ([2607:f8b0:4001:c0b::236]:42407) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eatZA-0002BB-FT for bug-guix@gnu.org; Sun, 14 Jan 2018 20:28:00 -0500 Received: by mail-it0-x236.google.com with SMTP id p139so15002492itb.1 for ; Sun, 14 Jan 2018 17:28:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:user-agent:mime-version :content-transfer-encoding; bh=K6Qi2pb6IpfYoZ7a6ipqYFuWI1kMqOd+i89wbQtsyWM=; b=qtHKzu9PXdA+n3UXn2Xgm8DOGNEAndjWVzIivE7zTuM8S3RmOA8WoaRDDyTgcWg5y8 MhckM6fkjJbfHt+7X/MylgqMA+qJNRvHusU5APtlJGtM4xTz/9jK4UAOxd6dLAtD5h2/ mulv4IMsLvkXt6/ssUpUq8pjEKxtTkB3eXvwRjPN25RLJ0/8YCiHVOtm4/r+WcJ8PMTR ea+NM7+eLWt6WK+Lb69tV1qgr5hfP6G0OL+c0NPKQpSrQ8ad605VMhO+tqcrUwb4XREn qrEjuyMYCJt1VSN8iahPcqdCGWh1RRVHF668HCt6JXn7FmhgN/BdIbTG1OIdqFCcTZ9E gWuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:user-agent :mime-version:content-transfer-encoding; bh=K6Qi2pb6IpfYoZ7a6ipqYFuWI1kMqOd+i89wbQtsyWM=; b=uHbNa2nDFOYEBgcDtXfu1Hcao+aXVlqGV0yfNlAiATNswoTXCmtjOkry94bvy0iWiO GdqREyvvIaAzdHcdD9U71Se3BG5grVrAHgq3hIMr8XPCQ48hdvVCij4BzxQfW6gQeMHT iNWRKL/1O48JJ2vLbXM6K707J0uD431iCxvXo6aY0oujfeftvijwCWTz603vNs/QtRVz vuH5bBduicQzNwPHymQ3AQJ3ci0WxjYz51atKjRZpKWDoZJj7C8vAKvDseK4XKsgcKR9 kOmO7z5govFXj8t8Zy4sQT/iM/+dFMwZ+678Z/giJXNwGu3kYsj0nw+2dcRfQNzvKWT1 E0ZQ== X-Gm-Message-State: AKwxyteh3cKiFURAWdIWgcyKQDIMAscISTh3pkb0r70LNMJekrbxC2Uv hXiUf9iuDwC1z5VU7D+Ip2IijA== X-Google-Smtp-Source: ACJfBouWfPnO5WS8hGXSiS14dArxvb8l/HnKIkPaYiKrsZrFL+gxX35zQ/auU5BN/zQLCRXfrXR3sg== X-Received: by 10.36.176.8 with SMTP id d8mr12547035itf.126.1515979679390; Sun, 14 Jan 2018 17:27:59 -0800 (PST) Received: from apteryx ([45.72.232.234]) by smtp.gmail.com with ESMTPSA id k73sm12600109ioe.24.2018.01.14.17.27.58 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 14 Jan 2018 17:27:58 -0800 (PST) From: Maxim Cournoyer To: bug-guix Subject: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) Date: Sun, 14 Jan 2018 20:27:57 -0500 Message-ID: <87r2qrc3mq.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: base64 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) SGVsbG8sDQoNCkkndmUgZW5jb3VudGVyZWQgdGhlIGZvbGxvd2luZyBjcmFzaCB3aGVuIHRyeWlu ZyB0byB1c2Ugc3Vic3RpdHV0ZSBvbiBhDQpmaWxlIHdoaWNoIGNvbnRhaW5zIE5VTCBjaGFyYWN0 ZXJzOg0KDQotLTg8LS0tLS0tLS0tLS0tLS0tY3V0IGhlcmUtLS0tLS0tLS0tLS0tLS1zdGFydC0t LS0tLS0tLS0tLS0+OC0tLQ0KKGRlZmluZSBwcm9ibGVtYXRpYy1maWxlICIvdG1wL2JwLWltYWdl LWRhdGEuZWwiKQ0Kc2NoZW1lQChndWl4IGJ1aWxkIHV0aWxzKT4gLG0gKGd1aXggYnVpbGQgdXRp bHMpDQpzY2hlbWVAKGd1aXggYnVpbGQgdXRpbHMpPiAoc3Vic3RpdHV0ZSogcHJvYmxlbWF0aWMt ZmlsZQ0KCQkJICAgICAoKCJ0b3RvIikgInRhdGEiKSkNCmljZS05L2Jvb3QtOS5zY206NzUyOjI1 OiBJbiBwcm9jZWR1cmUgZGlzcGF0Y2gtZXhjZXB0aW9uOg0Kc3RyaW5nIGNvbnRhaW5zICNcbnVs IGNoYXJhY3RlcjogIlwiSUkqXHgwMChceDAzXHgwMFx4MDDvv73vv73vv73vv73vv73vv73vv73v v73vv73vv71AQEBA77+977+977+977+9XHgwNFx4MDRceDA0XHgwNO+/ve+/ve+/ve+/vVx4MDFc eDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4 MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDRceDA0XHgwNFx4MDTvv73vv73vv73vv71CQkJC77+9 77+977+977+977+977+977+977+977+977+977+977+977+977+977+977+977+977+977+977+9 QEBAQO+/ve+/ve+/ve+/vVx4MDRceDA0XHgwNFx4MDTvv73vv73vv73vv71ceDAxXHgwMVx4MDFc eDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4 MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgw Me+/ve+/ve+/ve+/vVx4MDRceDA0XHgwNFx4MDTvv73vv73vv73vv71CQkJC77+977+977+977+9 77+977+977+977+977+977+977+977+9XHgwNFx4MDRceDA0XHgwNO+/ve+/ve+/ve+/vVx4MDFc eDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4 MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgw MVx4MDFceDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAx XHgwMVx4MDHvv73vv73vv73vv71ceDA0XHgwNFx4MDRceDA077+977+977+977+9QkJCQu+/ve+/ ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+9 77+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73v v73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+9 77+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MTBceDEwXHgxMFx4MTDvv73vv73vv73v v71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/ vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9 XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71c eDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4 MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDEwXHgxMFx4MTBceDEw77+977+977+977+9XHgw MVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAx XHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFc eDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4 MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgw MVx4MDFceDAx77+977+977+977+9XHgxMFx4MTBceDEwXHgxMO+/ve+/ve+/ve+/vVx4MDRceDA0 XHgwNFx4MDTvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwMVx4MDFc eDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4 MDFceDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgw MVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwNFx4MDRceDA0 XHgwNO+/ve+/ve+/ve+/vT4+Pj7vv73vv73vv73vv708PDw877+977+977+977+9XHgwNFx4MDRc eDA0XHgwNO+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4 MDFceDAx77+977+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDFceDAxXHgw MVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+977+977+977+9XHgwNFx4MDRceDA0 XHgwNO+/ve+/ve+/ve+/vT4+Pj7vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73v v73vv73vv73vv73vv73vv73vv73vv708PDw877+977+977+977+9XHgwNFx4MDRceDA0XHgwNO+/ ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHvv73vv73vv73vv71ceDAxXHgwMVx4MDFceDAx77+9 77+977+977+9XHgwMVx4MDFceDAxXHgwMe+/ve+/ve+/ve+/vVx4MDRceDA0XHgwNFx4MDTvv73v v73vv73vv70+Pj4+77+977+977+977+977+977+977+977+977+977+977+977+977+977+977+9 77+977+977+977+977+977+977+977+977+977+977+977+977+977+977+977+977+977+977+9 77+977+9PDw8PO+/ve+/ve+/ve+/vVx4MGZceDBmXHgwZlx4MGbvv73vv73vv73vv71ceDBmXHgw Zlx4MGZceDBm77+977+977+977+9XHgwZlx4MGZceDBmXHgwZu+/ve+/ve+/ve+/vT4+Pj7vv73v v73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73vv73v v73vv73vv73vv73vv73vv71ceDE0XHgwMFx4MDBceDAxXHgwM1x4MDBceDAxXHgwMFx4MDBceDAw XG4iDQoNCkVudGVyaW5nIGEgbmV3IHByb21wdC4gIFR5cGUgYCxidCcgZm9yIGEgYmFja3RyYWNl IG9yIGAscScgdG8gY29udGludWUuDQpzY2hlbWVAKGd1aXggYnVpbGQgdXRpbHMpIFsxXT4gLGJ0 DQpJbiBpY2UtOS9ib290LTkuc2NtOg0KICAgIDg0MTo0ICA5ICh3aXRoLXRocm93LWhhbmRsZXIg XyBfIF8pDQpJbiBpY2UtOS9wb3J0cy5zY206DQogICA0NDQ6MTcgIDggKGNhbGwtd2l0aC1pbnB1 dC1maWxlIF8gXyAjOmJpbmFyeSBfICM6ZW5jb2RpbmcgXyAjOmd1ZXNzLWVuY29kaW5nIF8pDQpJ biBndWl4L2J1aWxkL3V0aWxzLnNjbToNCiAgIDYwOToyNiAgNyAoXyBfKQ0KICAgNjM1OjI2ICA2 IChfICM8aW5wdXQ6IC90bXAvYnAtaW1hZ2UtZGF0YS5lbCAxND4gIzxpbnB1dC1vdXRwdXQ6IC90 bXAvYnAtaW1hZ2UtZGF0YS5lbC5xVnl0em8gMTM+KQ0KSW4gc3JmaS9zcmZpLTEuc2NtOg0KICAg NDY2OjE4ICA1IChmb2xkICM8cHJvY2VkdXJlIDdmMjliODkyOTUyMCBhdCBndWl4L2J1aWxkL3V0 aWxzLnNjbTo2MzU6MzIgKHIrcCBsaW5lKT4gIlwiSUkqXHgwMChceDAzXHgwMFx4MDDvv73vv73v v73igKYiIOKApikNCkluIGd1aXgvYnVpbGQvdXRpbHMuc2NtOg0KICAgNjM4OjM3ICA0IChfIF8g IlwiSUkqXHgwMChceDAzXHgwMFx4MDDvv73vv73vv73vv73vv73vv73vv73vv73vv73vv71AQEBA 77+977+977+977+9XHgwNFx4MDRceDA0XHgwNO+/ve+/ve+/ve+/vVx4MDFceDAxXHgwMVx4MDHv v73vv73vv73vv71ceDAxXHgwMVx4MDFceDDigKYiKQ0KSW4gaWNlLTkvcmVnZXguc2NtOg0KICAg MTg5OjEyICAzIChsaXN0LW1hdGNoZXMgXyBfIF8pDQogICAxNzc6MTkgIDIgKGZvbGQtbWF0Y2hl cyBfICJcIklJKlx4MDAoXHgwM1x4MDBceDAw77+977+977+977+977+977+977+977+977+977+9 QEBAQO+/ve+/ve+/ve+/vVx4MDRceDA0XHgwNFx4MDTvv73vv73vv73vv71ceDAxXHgwMVx4MDFc eDAx77+977+977+977+9XHgw4oCmIiDigKYpDQpJbiB1bmtub3duIGZpbGU6DQogICAgICAgICAg IDEgKHJlZ2V4cC1leGVjICM8cmVnZXhwIDUxZjNiYzA+ICJcIklJKlx4MDAoXHgwM1x4MDBceDAw 77+977+977+977+977+977+977+977+977+977+9QEBAQO+/ve+/ve+/ve+/vVx4MDRceDA0XHgw NFx4MDTvv73vv73vv73vv71ceDAxXHgwMeKApiIg4oCmKQ0KSW4gaWNlLTkvYm9vdC05LnNjbToN CiAgIDc1MjoyNSAgMCAoZGlzcGF0Y2gtZXhjZXB0aW9uIF8gXyBfKQ0KLS04PC0tLS0tLS0tLS0t LS0tLWN1dCBoZXJlLS0tLS0tLS0tLS0tLS0tZW5kLS0tLS0tLS0tLS0tLS0tPjgtLS0NCg0KVGhh dCBmaWxlIGNvbWVzIGZyb20gZW1hY3MtcmVhbGd1ZCwgd2hpY2ggSSdtIGF0dGVtcHRpbmcgdG8g cGFja2FnZToNCmh0dHBzOi8vZ2l0aHViLmNvbS9yZWFsZ3VkL3JlYWxndWQvYmxvYi9tYXN0ZXIv cmVhbGd1ZC9jb21tb24vYnAtaW1hZ2UtZGF0YS5lbC4NCg0KVGhpcyB3YXMgZGlzY292ZXJlZCB3 aGVuIHRoZSBwYXRjaC1lbC1maWxlcyBwaGFzZSBvZiB0aGUNCmVtYWNzLWJ1aWxkLXN5c3RlbSBj cmFzaGVkIGFzIGFib3ZlIHdoZW4gaXQgY2FsbGVkIHN1YnN0aXR1dGUqLg0KDQpQYXRjaCB0byBm b2xsb3cuDQoNCk1heGltDQo= From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 14 20:38:32 2018 Received: (at 30116) by debbugs.gnu.org; 15 Jan 2018 01:38:32 +0000 Received: from localhost ([127.0.0.1]:56516 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eatjM-0000ug-Hq for submit@debbugs.gnu.org; Sun, 14 Jan 2018 20:38:32 -0500 Received: from mail-it0-f48.google.com ([209.85.214.48]:37386) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eatjK-0000uT-Aw for 30116@debbugs.gnu.org; Sun, 14 Jan 2018 20:38:30 -0500 Received: by mail-it0-f48.google.com with SMTP id q8so3532994itb.2 for <30116@debbugs.gnu.org>; Sun, 14 Jan 2018 17:38:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:references:date:in-reply-to:message-id:user-agent :mime-version:content-disposition:content-transfer-encoding; bh=1HWDBprkdaEjBzsV3g15puImp29IJuSOYhsdsQBDX8w=; b=j8SzVL2+mecdWAnBXOtU0TDltYGvbGNT9Tnv48bFELWnyJKBeBiy3U1n+va8lK1MJl LHmKqCzUX1v3KdNmHxLThb+ZrNjcswtcA6vvChhF4hiAzGhsu9y0/P44ywXidLbRH048 99HUmf201PCzrbzVfavxzN5KEqG9i8m3U1VNPpMpmcZ1Ox2c1qgh7WhaB6I5cdN84jde 8UKD2vp6rlSeC9sGB3IUo1zv/eZAknocj57FRn6jR3B1gYxs+emMFRU0DFj8X/STYaRE B/CnEi9Nbra7UPxiTQ9Ik1sqNmSTn/7oir8CgcHrQ/Vaqzd8JAXuXxav0eW20F5Sv3Vz D4dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-disposition :content-transfer-encoding; bh=1HWDBprkdaEjBzsV3g15puImp29IJuSOYhsdsQBDX8w=; b=Dx7RYzpZvAjcbbBYWPa4/BGcO+M/U4SN/tD46gYnavhMcLxypuWV4QLKqCGoKSAVxr Qs5wDalReM1R4jQBtHUKkKVH/ujNcaYZez8wCtQJUMxhKLCfo3irrc/hlxibszFC2dqV kZxWcQQoV1Bxn4e902LxZ/fRZUS7kisB0uqxscB3sf+QhrLkaPyk7UG4RffGxgTc3FxI lWQM8GnF8rzB7RauKZrVMdpHNkZh+xcVeMPRoMyIYtw+qRGse8H8I7/QmNtA86YE5+jk /4o4fte+Cha5HxDLRDlwgsgDZ0S3hqVcI9xSANiQrpz/I1uujzhGVP277WmY+oTfCB4s IdPg== X-Gm-Message-State: AKwxytcNJJtv3gR+8GtB+H9PTtkm3yuqlEsEJQTzwe0xMyzviXFm81kp QwxgxreYrbARbpQ/2z/7XmMSfA== X-Google-Smtp-Source: ACJfBotKyVmB6xR2lH4/iOFHcpthjBuYZNI6dmZMrJCxdUYUTamo0nC2i6A7QJp8wjpgxtN/rGiP0A== X-Received: by 10.36.241.65 with SMTP id q1mr11793129iti.4.1515980304522; Sun, 14 Jan 2018 17:38:24 -0800 (PST) Received: from apteryx ([45.72.232.234]) by smtp.gmail.com with ESMTPSA id b34sm2333437itd.38.2018.01.14.17.38.23 for <30116@debbugs.gnu.org> (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 14 Jan 2018 17:38:23 -0800 (PST) From: Maxim Cournoyer To: 30116@debbugs.gnu.org Subject: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) References: <87r2qrc3mq.fsf@gmail.com> Date: Sun, 14 Jan 2018 20:38:22 -0500 In-Reply-To: (GNU bug Tracking System's message of "Mon, 15 Jan 2018 01:29:02 +0000") Message-ID: <87k1wjc35d.fsf_-_@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/x-patch; charset=utf-8 Content-Disposition: attachment; filename=0001-utils-Prevent-substitute-from-crashing-on-files-cont.patch Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 30116 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) >From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001 From: Maxim Cournoyer Date: Sun, 14 Jan 2018 20:31:33 -0500 Subject: [PATCH] utils: Prevent substitute from crashing on files containing NUL chars. Fixes issue #30116. * guix/build/utils.scm (substitute): Add condition to skip lines containing the NUL character. --- guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------ 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/guix/build/utils.scm b/guix/build/utils.scm index 7391307c8..975f4e70a 100644 --- a/guix/build/utils.scm +++ b/guix/build/utils.scm @@ -3,6 +3,7 @@ ;;; Copyright =C2=A9 2013 Andreas Enge ;;; Copyright =C2=A9 2013 Nikita Karetnikov ;;; Copyright =C2=A9 2015 Mark H Weaver +;;; Copyright =C2=A9 2018 Maxim Cournoyer ;;; ;;; This file is part of GNU Guix. ;;; @@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the lin= e that will be written as a substitution of the original line. Be careful about using '$' to match = the end of a line; by itself it won't match the terminating newline of a line." (let ((rx+proc (map (match-lambda - (((? regexp? pattern) . proc) - (cons pattern proc)) - ((pattern . proc) - (cons (make-regexp pattern regexp/extended) - proc))) + (((? regexp? pattern) . proc) + (cons pattern proc)) + ((pattern . proc) + (cons (make-regexp pattern regexp/extended) + proc))) pattern+procs))) (with-atomic-file-replacement file (lambda (in out) (let loop ((line (read-line in 'concat))) - (if (eof-object? line) - #t - (let ((line (fold (lambda (r+p line) - (match r+p - ((regexp . proc) - (match (list-matches regexp line) - ((and m+ (_ _ ...)) - (proc line m+)) - (_ line))))) - line - rx+proc))) - (display line out) - (loop (read-line in 'concat))))))))) + (cond + ((eof-object? line) + #t) + ((string-contains line (make-string 1 #\nul)) + ;; The regexp functions of the GNU C library (which Guile uses) + ;; cannot deal with NUL characters, so skip to the next line. + (format #t "skipping line with NUL characters: ~s\n" line) + (loop (read-line in 'concat))) + (else + (let ((line (fold (lambda (r+p line) + (match r+p + ((regexp . proc) + (match (list-matches regexp line) + ((and m+ (_ _ ...)) + (proc line m+)) + (_ line))))) + line + rx+proc))) + (display line out) + (loop (read-line in 'concat)))))))))) =20 =20 (define-syntax let-matches --=20 2.15.1 From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 16 06:23:21 2018 Received: (at 30116) by debbugs.gnu.org; 16 Jan 2018 11:23:21 +0000 Received: from localhost ([127.0.0.1]:58124 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ebPKq-00011f-RN for submit@debbugs.gnu.org; Tue, 16 Jan 2018 06:23:21 -0500 Received: from hera.aquilenet.fr ([185.233.100.1]:51944) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ebPKp-00011W-6Y for 30116@debbugs.gnu.org; Tue, 16 Jan 2018 06:23:19 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id AA11410D26; Tue, 16 Jan 2018 12:23:18 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id srZ9YVqE_umA; Tue, 16 Jan 2018 12:23:18 +0100 (CET) Received: from ribbon (unknown [193.50.110.60]) by hera.aquilenet.fr (Postfix) with ESMTPSA id EAB5410D25; Tue, 16 Jan 2018 12:23:17 +0100 (CET) From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 27 =?utf-8?Q?Niv=C3=B4se?= an 226 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 16 Jan 2018 12:23:17 +0100 In-Reply-To: <87r2qrc3mq.fsf@gmail.com> (Maxim Cournoyer's message of "Sun, 14 Jan 2018 20:27:57 -0500") Message-ID: <87o9lu6o9m.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 30116 Cc: 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) Hi, Maxim Cournoyer skribis: > I've encountered the following crash when trying to use substitute on a > file which contains NUL characters: Yes, that=E2=80=99s because Guile=E2=80=99s =E2=80=98regexp-exec=E2=80=99 s= imply wraps libc=E2=80=99s =E2=80=98regexec=E2=80=99, which does not handle NULs. We should consider switching to the pure-Scheme SRFI-115: https://srfi.schemers.org/srfi-115/srfi-115.html Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 17 09:37:59 2018 Received: (at 30116) by debbugs.gnu.org; 17 Jan 2018 14:37:59 +0000 Received: from localhost ([127.0.0.1]:59580 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eboqk-0002dZ-OY for submit@debbugs.gnu.org; Wed, 17 Jan 2018 09:37:58 -0500 Received: from hera.aquilenet.fr ([185.233.100.1]:43916) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eboqj-0002dR-95 for 30116@debbugs.gnu.org; Wed, 17 Jan 2018 09:37:57 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 1B46911035; Wed, 17 Jan 2018 15:37:56 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FgwK_mdq0fY4; Wed, 17 Jan 2018 15:37:55 +0100 (CET) Received: from ribbon (unknown [IPv6:2a01:e0a:1d:7270:af76:b9b:ca24:c465]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 5E6F311034; Wed, 17 Jan 2018 15:37:55 +0100 (CET) From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 28 =?utf-8?Q?Niv=C3=B4se?= an 226 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Wed, 17 Jan 2018 15:37:54 +0100 In-Reply-To: <87k1wjc35d.fsf_-_@gmail.com> (Maxim Cournoyer's message of "Sun, 14 Jan 2018 20:38:22 -0500") Message-ID: <87h8rk4kl9.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 30116 Cc: 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) Maxim Cournoyer skribis: > From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001 > From: Maxim Cournoyer > Date: Sun, 14 Jan 2018 20:31:33 -0500 > Subject: [PATCH] utils: Prevent substitute from crashing on files contain= ing > NUL chars. > > Fixes issue #30116. > > * guix/build/utils.scm (substitute): Add condition to skip lines containi= ng > the NUL character. [...] > + ((string-contains line (make-string 1 #\nul)) Rather (string-index line #\nul). > + ;; The regexp functions of the GNU C library (which Guile us= es) > + ;; cannot deal with NUL characters, so skip to the next line. > + (format #t "skipping line with NUL characters: ~s\n" line) > + (loop (read-line in 'concat))) Rather (format (current-error-port) =E2=80=A6). It=E2=80=99s strange semantics, but it=E2=80=99s probably better than crash= ing in the contexts where we use it. Otherwise LGTM. This would have to go to the next =E2=80=98core-updates=E2= =80=99 (or =E2=80=98core-updates-next=E2=80=99 in the meantime.) Thanks! Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 20 23:24:46 2018 Received: (at 30116) by debbugs.gnu.org; 21 Jan 2018 04:24:46 +0000 Received: from localhost ([127.0.0.1]:36486 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ed7BW-0004fe-Ev for submit@debbugs.gnu.org; Sat, 20 Jan 2018 23:24:46 -0500 Received: from mail-io0-f175.google.com ([209.85.223.175]:45514) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ed7BT-0004fP-8o for 30116@debbugs.gnu.org; Sat, 20 Jan 2018 23:24:45 -0500 Received: by mail-io0-f175.google.com with SMTP id p188so6122976ioe.12 for <30116@debbugs.gnu.org>; Sat, 20 Jan 2018 20:24:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=uxRZGoUVdN1537Z2udbehB1IrfCHEFNwIx+eEpG+CwU=; b=N0oUaGpSFe8Hla3TvlKuDhUwIXDr74gzWCYhTQvtXRBvU/M7d/VT68TmdDllNav/nf nh/EX1GWulzRCftPTTgBNknwQuz70coWh0HozAlwfyY+202TtjjKBLcl26O0C1GXQczO FVtLFuGBAIrjQiBg3417t/5yH/qyNHU0D1FTRwYu2hpVBzWHpx/Vd9r/G4hCG+XZ+YX/ ti3x9NdVbe1VYvCG1m7KDrVkKfVCAQaEZPkSaPbYfTucWVERtxsMNE84uuyivpepkiMW mZx69d8MaPXH8eGXNM9Z9gpC0gXu/MQhxG5n6KxvKiLynAu+apPkt1/CKmL85rWhZyTH Gp4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=uxRZGoUVdN1537Z2udbehB1IrfCHEFNwIx+eEpG+CwU=; b=VAGJt39exVZ0iur7HmWjF/IHIn2+rA2oQxGVrnGBMPwyHczfkXXnb3f3Ft4OJVqPq0 3cSDLlBG9H1s08KIm0IGB6kb8dLab9Q6uAg09ByRa8CGSl3U8Q1tDIXtxR1HkGHc6ArV O27el79GzRQPBt4IvgF8jfvq7BmKRh75KOgh9Vmu7oh48yYVyUGidKAy2lf9z06n6tK+ p7m9ra8aTFs+dNO+ATc+ab04rb03nk9vYSmwED79UxO5qDjFpJgJZNLK5tWwhRMXDqgf l5z8sqdDP3zW1t2ANZlmWpacjPCCv1md0ESTXrOYwQoCEPWWK4xcRPY26S6sBIUGSgPP iDQA== X-Gm-Message-State: AKwxytfi7uWfclZ/ftaXko2N19k3Eem2fKDdUkXqR2sFVzrYghAVHm4D g7JbMtP+nwbY8FXzPisDS6n9ZQ== X-Google-Smtp-Source: AH8x224KBFvICbUZW82QQcZMuEzoGpH9Z+COX6VVQBD5Xhwt20sLn9Gh8h6INyGLhQLmqdb+APkcnA== X-Received: by 10.107.3.209 with SMTP id e78mr3486046ioi.96.1516508676827; Sat, 20 Jan 2018 20:24:36 -0800 (PST) Received: from apteryx ([45.72.232.234]) by smtp.gmail.com with ESMTPSA id t84sm2391149iod.6.2018.01.20.20.24.35 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 20 Jan 2018 20:24:36 -0800 (PST) From: Maxim Cournoyer To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> Date: Sat, 20 Jan 2018 23:24:34 -0500 In-Reply-To: <87o9lu6o9m.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 16 Jan 2018 12:23:17 +0100") Message-ID: <87607vu9dp.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30116 Cc: 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Hi, > > Maxim Cournoyer skribis: > >> I've encountered the following crash when trying to use substitute on a >> file which contains NUL characters: > > Yes, that=E2=80=99s because Guile=E2=80=99s =E2=80=98regexp-exec=E2=80=99= simply wraps libc=E2=80=99s =E2=80=98regexec=E2=80=99, > which does not handle NULs. > > We should consider switching to the pure-Scheme SRFI-115: > > https://srfi.schemers.org/srfi-115/srfi-115.html This looks good, and I started looking into porting `substitute' to it, but quickly noticed it doesn't seem to be implemented in Guile yet? Thanks, Maxim From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 21 13:18:24 2018 Received: (at 30116) by debbugs.gnu.org; 21 Jan 2018 18:18:24 +0000 Received: from localhost ([127.0.0.1]:37326 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1edKCG-00056c-0c for submit@debbugs.gnu.org; Sun, 21 Jan 2018 13:18:24 -0500 Received: from world.peace.net ([50.252.239.5]:42412) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1edKCF-00056L-2l for 30116@debbugs.gnu.org; Sun, 21 Jan 2018 13:18:23 -0500 Received: from [98.216.255.118] (helo=jojen) by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1edKC8-0008Jb-HZ; Sun, 21 Jan 2018 13:18:16 -0500 From: Mark H Weaver To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> <87607vu9dp.fsf@gmail.com> Date: Sun, 21 Jan 2018 13:17:45 -0500 In-Reply-To: <87607vu9dp.fsf@gmail.com> (Maxim Cournoyer's message of "Sat, 20 Jan 2018 23:24:34 -0500") Message-ID: <877esb84ae.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30116 Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= , 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Maxim Cournoyer writes: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: > >> Maxim Cournoyer skribis: >> >>> I've encountered the following crash when trying to use substitute on a >>> file which contains NUL characters: >> >> Yes, that=E2=80=99s because Guile=E2=80=99s =E2=80=98regexp-exec=E2=80= =99 simply wraps libc=E2=80=99s =E2=80=98regexec=E2=80=99, >> which does not handle NULs. >> >> We should consider switching to the pure-Scheme SRFI-115: >> >> https://srfi.schemers.org/srfi-115/srfi-115.html > > This looks good, and I started looking into porting `substitute' to it, > but quickly noticed it doesn't seem to be implemented in Guile yet? Indeed. SRFI-115 for Guile is on my TODO list, although it might be better to wait until after we switch to using UTF-8 encoding internally for strings, since that will drastically affect the implementation of any efficient regexp matcher on Scheme strings. Anyway, 'substitute*' is to be used only on text files, and NUL bytes are not a valid textual character. So, I think that this case is outside of what 'substitute*' is meant to do, and therefore not a bug in 'substitute*', although of course a more graceful error would surely be preferable. What do you think? Mark From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 22 05:58:42 2018 Received: (at 30116) by debbugs.gnu.org; 22 Jan 2018 10:58:42 +0000 Received: from localhost ([127.0.0.1]:37737 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1edZoH-0007jl-Rl for submit@debbugs.gnu.org; Mon, 22 Jan 2018 05:58:42 -0500 Received: from hera.aquilenet.fr ([185.233.100.1]:60822) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1edZoF-0007jc-I1 for 30116@debbugs.gnu.org; Mon, 22 Jan 2018 05:58:39 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id E1DA21022A; Mon, 22 Jan 2018 11:58:38 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MOv8O8CUgGPj; Mon, 22 Jan 2018 11:58:37 +0100 (CET) Received: from ribbon (unknown [193.50.110.135]) by hera.aquilenet.fr (Postfix) with ESMTPSA id C101710228; Mon, 22 Jan 2018 11:58:37 +0100 (CET) From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Mark H Weaver Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> <87607vu9dp.fsf@gmail.com> <877esb84ae.fsf@netris.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 3 =?utf-8?Q?Pluvi=C3=B4se?= an 226 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 22 Jan 2018 11:58:37 +0100 In-Reply-To: <877esb84ae.fsf@netris.org> (Mark H. Weaver's message of "Sun, 21 Jan 2018 13:17:45 -0500") Message-ID: <87o9lmp3c2.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 30116 Cc: 30116@debbugs.gnu.org, Maxim Cournoyer X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) Mark H Weaver skribis: > Maxim Cournoyer writes: > >> ludo@gnu.org (Ludovic Court=C3=A8s) writes: >> >>> Maxim Cournoyer skribis: >>> >>>> I've encountered the following crash when trying to use substitute on a >>>> file which contains NUL characters: >>> >>> Yes, that=E2=80=99s because Guile=E2=80=99s =E2=80=98regexp-exec=E2=80= =99 simply wraps libc=E2=80=99s =E2=80=98regexec=E2=80=99, >>> which does not handle NULs. >>> >>> We should consider switching to the pure-Scheme SRFI-115: >>> >>> https://srfi.schemers.org/srfi-115/srfi-115.html >> >> This looks good, and I started looking into porting `substitute' to it, >> but quickly noticed it doesn't seem to be implemented in Guile yet? ISTR that the reference implementation works fine on Guile. > Indeed. SRFI-115 for Guile is on my TODO list, although it might be > better to wait until after we switch to using UTF-8 encoding internally > for strings, since that will drastically affect the implementation of > any efficient regexp matcher on Scheme strings. Indeed, though I suppose it doesn=E2=80=99t matter much for the cases where =E2=80=98substitute*=E2=80=99 is used? > Anyway, 'substitute*' is to be used only on text files, and NUL bytes > are not a valid textual character. So, I think that this case is > outside of what 'substitute*' is meant to do, and therefore not a bug in > 'substitute*', although of course a more graceful error would surely be > preferable. Yes, that=E2=80=99s also a good point. So yeah, I think it may be good =E2=80=9Ceventually=E2=80=9D to switch to S= RFI-115, but that=E2=80=99s not urgent. Thoughts? Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 22 23:27:16 2018 Received: (at 30116) by debbugs.gnu.org; 23 Jan 2018 04:27:16 +0000 Received: from localhost ([127.0.0.1]:38887 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1edqB1-0005y7-5a for submit@debbugs.gnu.org; Mon, 22 Jan 2018 23:27:16 -0500 Received: from mail-it0-f48.google.com ([209.85.214.48]:43811) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1edqAz-0005xs-A2 for 30116@debbugs.gnu.org; Mon, 22 Jan 2018 23:27:13 -0500 Received: by mail-it0-f48.google.com with SMTP id u62so12513120ita.2 for <30116@debbugs.gnu.org>; Mon, 22 Jan 2018 20:27:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=B/3WdmMw4ITxC3Z/5c1HHeU/WyjgJDg+5sAh3ASPEQY=; b=jvb+t565k6evZ7evXjqhv53MPxMByh6EIQIRZNUV2JYOAk7+7IXJYu+w11Xv7ntDsW e4CymIWs0a9qaZuiRrOxQ6NSBhgv1ReqZrGCiXHHmorhW11sr/uMSEixLq7kTamOJ7iE SQGWK9my+UOvcbd52/U/P3k95la2ix7zxmoYxao/6SXjwMBxhLabdA1gqY+pToRrcc+G TSBZeSIhy5nanqrVC2TC8VP4eXnCtAo/k1k6MjbIY+3s+yvPrJeRggqy+DY/2bFW0bBs 23G+N4X9ifWTezJamicXxNo58BJrr6CE5NplZYy1YFCVHmK+1gP4S908C8w7GSHVL6M1 81OA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=B/3WdmMw4ITxC3Z/5c1HHeU/WyjgJDg+5sAh3ASPEQY=; b=nmYcNmqqdmc5sdL5iuKkx341oMZjnct1NG3FRzTYep9l9DF0dnv/yZA6HeXucGac+C IiW30dGs1b27IZRAGmHs6kHFCESMmZPz4bUxz/WWAnMUYjIGm1fh8/K3mddqmWrI7AbJ AvimWVja8mPEBH7wWZ2zrwwrhCJr9bLKyFL+1O7OU2wMIXpVt5NsD6Fizr0Tmg70NP/e VgdEiY8Xk8Y3+NRrQ4EVydClJK0QjV7OAaEnm2ODBDW0BsWTCB5UAXLig2uRkLmbQ9Ve NKOLo5/e0L8bFMGLo8K3OgO6ENQGy2LIk5OFNdWDdbGeOJkvKK2DTirAxXp5ZkA9nW8D XZnw== X-Gm-Message-State: AKwxytddcdVthpsoqlP9bLLtpTtGOK+M1uUe8GlFBDaCbovYjR02tGn2 HvlaN8N0dnUb3yXoP3rt9wd2BQ== X-Google-Smtp-Source: AH8x226SqaFcnQPB4dJzSrTGhxrLHixTSefft39xn9An3qMqMwfmsXIDpb98t99v6R2NcGMxSfNq1g== X-Received: by 10.36.39.215 with SMTP id g206mr1883791ita.17.1516681627127; Mon, 22 Jan 2018 20:27:07 -0800 (PST) Received: from apteryx ([45.72.232.234]) by smtp.gmail.com with ESMTPSA id g69sm4547837ita.9.2018.01.22.20.27.05 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 22 Jan 2018 20:27:05 -0800 (PST) From: Maxim Cournoyer To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> <87607vu9dp.fsf@gmail.com> <877esb84ae.fsf@netris.org> <87o9lmp3c2.fsf@gnu.org> Date: Mon, 22 Jan 2018 23:27:04 -0500 In-Reply-To: <87o9lmp3c2.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Mon, 22 Jan 2018 11:58:37 +0100") Message-ID: <87k1w9ryhz.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 30116 Cc: Mark H Weaver , 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Mark H Weaver skribis: > >> Maxim Cournoyer writes: >> >>> ludo@gnu.org (Ludovic Court=C3=A8s) writes: >>> >>>> Maxim Cournoyer skribis: >>>> >>>>> I've encountered the following crash when trying to use substitute on= a >>>>> file which contains NUL characters: >>>> >>>> Yes, that=E2=80=99s because Guile=E2=80=99s =E2=80=98regexp-exec=E2=80= =99 simply wraps libc=E2=80=99s =E2=80=98regexec=E2=80=99, >>>> which does not handle NULs. >>>> >>>> We should consider switching to the pure-Scheme SRFI-115: >>>> >>>> https://srfi.schemers.org/srfi-115/srfi-115.html >>> >>> This looks good, and I started looking into porting `substitute' to it, >>> but quickly noticed it doesn't seem to be implemented in Guile yet? > > ISTR that the reference implementation works fine on Guile. > >> Indeed. SRFI-115 for Guile is on my TODO list, although it might be >> better to wait until after we switch to using UTF-8 encoding internally >> for strings, since that will drastically affect the implementation of >> any efficient regexp matcher on Scheme strings. > > Indeed, though I suppose it doesn=E2=80=99t matter much for the cases whe= re > =E2=80=98substitute*=E2=80=99 is used? > >> Anyway, 'substitute*' is to be used only on text files, and NUL bytes >> are not a valid textual character. So, I think that this case is >> outside of what 'substitute*' is meant to do, and therefore not a bug in >> 'substitute*', although of course a more graceful error would surely be >> preferable. > > Yes, that=E2=80=99s also a good point. > > So yeah, I think it may be good =E2=80=9Ceventually=E2=80=9D to switch to= SRFI-115, but > that=E2=80=99s not urgent. Sorry for taking some time to answer; I was puzzled by the fact that my repro didn't work when ran from the REPL. It seems the problem only occurs when run inside Guix's build environment, maybe a side effect which depends on the locale used? In the `patch-el-files' phase of the emacs-build-system, we find the following snippet: (with-directory-excursion el-dir ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encoded ;; with the "ISO-8859-1" locale. (unless (false-if-exception (substitute-cmd)) (with-fluids ((%default-port-encoding "ISO-8859-1")) (substitute-cmd)))) In case an exception is returned while processing the file, it is retried being opened with the "ISO-8859-1" encoding. Or, this resolves to a call to `open-file', which documentation says: =E2=80=98b=E2=80=99 Use binary mode, ensuring that each byte in the file will be read as one Scheme character. To provide this property, the file will be opened with the 8-bit character encoding "ISO-8859-1", ignoring the default port encoding. *Note Ports::, for more information on port encodings. So, by opening an file whose encoding is unknown as a ISO-8859-1 file, we are doing the same as if we had passed the 'binary option. Could this explain why we end up with NUL characters where we were expecting text? To validate this hypothesis, I've added the following test message to the patch-el-files phase: (unless (false-if-exception (substitute-cmd)) (format (current-error-port) ">>> IS THIS IT? <<<") (with-fluids ((%default-port-encoding "ISO-8859-1")) (substitute-cmd)))) And re-ran the emacs-realgud build (minus the patch working around this issue), and this is what I got: --8<---------------cut here---------------start------------->8--- starting phase `patch-el-files' >>> IS THIS IT? << =E2=80=A6) In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/gnu= -build-system.scm: 684:27 12 (_ _) In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/ema= cs-build-system.scm: 117:10 11 (patch-el-files #:outputs _) In srfi/srfi-1.scm: 640:9 10 (for-each # _) In ice-9/boot-9.scm: 849:4 9 (with-throw-handler _ _ _) In ice-9/ports.scm: 444:17 8 (call-with-input-file _ _ #:binary _ #:encoding _ # _) In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/uti= ls.scm: 609:26 7 (_ _) 635:26 6 (_ # #) In srfi/srfi-1.scm: 466:18 5 (fold # =E2=80=A6) In /gnu/store/mz8vs1cxv1z7yrc1awzgby61qnxd481p-module-import/guix/build/uti= ls.scm: 638:37 4 (_ _ "\"II*\x00(\x03\x00\x00=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3= =BF=C3=BF=C3=BF=C3=BE=C3=BF@@@@=C3=BF=C3=BF=C3=BF=C3=BF\x04\x04\=E2=80=A6") In ice-9/regex.scm: 189:12 3 (list-matches _ _ _) 177:19 2 (fold-matches _ "\"II*\x00(\x03\x00\x00=C3=BF=C3=BF=C3=BF=C3= =BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BE=C3=BF@@@@=C3=BF=E2=80=A6" =E2=80=A6) In unknown file: 1 (regexp-exec # "\"II*\x00(\x03\x00\x00=C3=BF=E2= =80=A6" =E2=80=A6) In ice-9/boot-9.scm: 760:25 0 (dispatch-exception _ _ _) ice-9/boot-9.scm:760:25: In procedure dispatch-exception: ice-9/boot-9.scm:760:25: string contains #\nul character: "\"II*\x00(\x03\x= 00\x00=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BE=C3=BF@@@@=C3= =BF=C3=BF=C3=BF=C3=BF\x04\x04\x04\x04=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x= 01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\= x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x04\x04\x04\x04=C3=BF=C3=BF=C3=BF=C3=BFBBBB= =C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3= =BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BE=C3=BF@@@@=C3=BF=C3=BF=C3=BF=C3=BF\x= 04\x04\x04\x04=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3= =BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3= =BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3= =BF=C3=BF=C3=BF\x04\x04\x04\x04=C3=BF=C3=BF=C3=BF=C3=BFBBBB=C3=BF=C3=BF=C3= =BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF\x04\x04\x04\x04= =C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x0= 1\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x= 01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\= x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF= =C3=BF\x04\x04\x04\x04=C3=BF=C3=BF=C3=BF=C3=BFBBBB=C3=BF=C3=BF=C3=BF=C3=BF\= x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF= =C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF= =C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF= =C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01= =C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x10\x10\x1= 0\x10=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x= 01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\= x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF= =C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF= =C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF= =C3=BF=C3=BF=C3=BF\x10\x10\x10\x10=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01= =C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x0= 1\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x= 01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\= x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF= =C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x10\x10\x10\x10=C3=BF=C3=BF= =C3=BF=C3=BF\x04\x04\x04\x04=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF= =C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01= =C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x0= 1\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x= 01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x04\x04\x04\x04=C3=BF=C3=BF=C3=BE=C3=BF>= >>>=C3=BF=C3=BF=C3=BE=C3=BF<<<<=C3=BF=C3=BF=C3=BF=C3=BF\x04\x04\x04\x04=C3= =BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x= 01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\= x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x04= \x04\x04\x04=C3=BF=C3=BF=C3=BE=C3=BF>>>>=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3= =BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF= =C3=BE=C3=BF<<<<=C3=BF=C3=BF=C3=BF=C3=BF\x04\x04\x04\x04=C3=BF=C3=BF=C3=BF= =C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF= =C3=BF=C3=BF\x01\x01\x01\x01=C3=BF=C3=BF=C3=BF=C3=BF\x04\x04\x04\x04=C3=BF= =C3=BF=C3=BE=C3=BF>>>>=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3= =BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF= =C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3= =BF=C3=BE=C3=BF<<<<=C3=BF=C3=BF=C3=BF=C3=BF\x0f\x0f\x0f\x0f=C3=BF=C3=BF=C3= =BF=C3=BF\x0f\x0f\x0f\x0f=C3=BF=C3=BF=C3=BF=C3=BF\x0f\x0f\x0f\x0f=C3=BF=C3= =BF=C3=BE=C3=BF>>>>=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF= =C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3=BF=C3= =BF=C3=BF=C3=BF=C3=BF=C3=BF\x14\x00\x00\x01\x03\x00\x01\x00\x00\x00\n" builder for `/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.= 4.drv' failed with exit code 1 @ build-failed /gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.= 4.4.drv - 1 builder for `/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-= realgud-1.4.4.drv' failed with exit code 1 guix build: error: build failed: build of `/gnu/store/ar2j6kxz99s3s5wjs2z7ykiw75m9vv72-emacs-realgud-1.4.4.drv' failed --8<---------------cut here---------------end--------------->8--- So it is indeed triggered by switching to the "ISO-8859-1" encoding (although I still cannot reproduce this from the REPL?). If I remove the exception guard like this: --8<---------------cut here---------------start------------->8--- (with-directory-excursion el-dir ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still encod= ed ;; with the "ISO-8859-1" locale. - (unless (false-if-exception (substitute-cmd)) - (with-fluids ((%default-port-encoding "ISO-8859-1")) - (substitute-cmd)))) + (substitute-cmd)) #t)) --8<---------------cut here---------------end--------------->8--- the exception thrown on the first substitute* call is this: --8<---------------cut here---------------start------------->8--- starting phase `patch-el-files' Backtrace: 12 (primitive-load "/gnu/store/dvyyqxfr08fsr18k2f43gakh23d=E2=80= =A6") In ice-9/eval.scm: 191:35 11 (_ _) In srfi/srfi-1.scm: 863:16 10 (every1 # =E2=80=A6) In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/gnu= -build-system.scm: 684:27 9 (_ _) In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/ema= cs-build-system.scm: 104:27 8 (patch-el-files #:outputs _) In srfi/srfi-1.scm: 640:9 7 (for-each # _) In ice-9/boot-9.scm: 849:4 6 (with-throw-handler _ _ _) In ice-9/ports.scm: 444:17 5 (call-with-input-file _ _ #:binary _ #:encoding _ # _) In /gnu/store/xn6p33hhfyz6l5j9jd9qpnblp9ajnb9k-module-import/guix/build/uti= ls.scm: 609:26 4 (_ _) 645:22 3 (_ # #) In ice-9/rdelim.scm: 195:24 2 (read-line _ _) In unknown file: 1 (%read-line #) In ice-9/boot-9.scm: 760:25 0 (dispatch-exception _ _ _) ice-9/boot-9.scm:760:25: In procedure dispatch-exception: ice-9/boot-9.scm:760:25: Throw to key `decoding-error' with args `("peek-char" "input decoding error" 84 #)'. --8<---------------cut here---------------end--------------->8--- Should we keep my workaround for now? It seems there are valid cases to have the file opened as "ISO-8859-1", but this can mean introducing binary symbols such as NUL in the data (thus regexp crashes). When we finally move to srfi-115, we should remove this workaround. WDYT? Here's an updated patch with Ludovic's suggestion: --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: attachment; filename=0001-utils-Prevent-substitute-from-crashing-on-files-cont.patch Content-Transfer-Encoding: quoted-printable >From 573ecb3570355c47aa18c091c0a193e7d90a6949 Mon Sep 17 00:00:00 2001 From: Maxim Cournoyer Date: Sun, 14 Jan 2018 20:31:33 -0500 Subject: [PATCH] utils: Prevent substitute from crashing on files containing NUL chars. Fixes issue #30116. * guix/build/utils.scm (substitute): Add condition to skip lines containing the NUL character. --- guix/build/utils.scm | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/guix/build/utils.scm b/guix/build/utils.scm index 7391307c8..2a37dba06 100644 --- a/guix/build/utils.scm +++ b/guix/build/utils.scm @@ -3,6 +3,7 @@ ;;; Copyright =C2=A9 2013 Andreas Enge ;;; Copyright =C2=A9 2013 Nikita Karetnikov ;;; Copyright =C2=A9 2015 Mark H Weaver +;;; Copyright =C2=A9 2018 Maxim Cournoyer ;;; ;;; This file is part of GNU Guix. ;;; @@ -621,28 +622,37 @@ PROC as (PROC LINE MATCHES); PROC must return the lin= e that will be written as a substitution of the original line. Be careful about using '$' to match = the end of a line; by itself it won't match the terminating newline of a line." (let ((rx+proc (map (match-lambda - (((? regexp? pattern) . proc) - (cons pattern proc)) - ((pattern . proc) - (cons (make-regexp pattern regexp/extended) - proc))) + (((? regexp? pattern) . proc) + (cons pattern proc)) + ((pattern . proc) + (cons (make-regexp pattern regexp/extended) + proc))) pattern+procs))) (with-atomic-file-replacement file (lambda (in out) (let loop ((line (read-line in 'concat))) - (if (eof-object? line) - #t - (let ((line (fold (lambda (r+p line) - (match r+p - ((regexp . proc) - (match (list-matches regexp line) - ((and m+ (_ _ ...)) - (proc line m+)) - (_ line))))) - line - rx+proc))) - (display line out) - (loop (read-line in 'concat))))))))) + (cond + ((eof-object? line) + #t) + ((string-index line #\nul) + ;; The regexp functions of the GNU C library (which Guile uses) + ;; cannot deal with NUL characters, so skip to the next line. + ;; TODO: Port to srfi-115, once we have it implemented in Guil= e. + (format (current-error-port) + "skipping line with NUL characters: ~s\n" line) + (loop (read-line in 'concat))) + (else + (let ((line (fold (lambda (r+p line) + (match r+p + ((regexp . proc) + (match (list-matches regexp line) + ((and m+ (_ _ ...)) + (proc line m+)) + (_ line))))) + line + rx+proc))) + (display line out) + (loop (read-line in 'concat)))))))))) =20 =20 (define-syntax let-matches --=20 2.16.0 --=-=-= Content-Type: text/plain Maxim --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 23 09:11:07 2018 Received: (at 30116) by debbugs.gnu.org; 23 Jan 2018 14:11:08 +0000 Received: from localhost ([127.0.0.1]:39218 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1edzI3-0006D4-Ml for submit@debbugs.gnu.org; Tue, 23 Jan 2018 09:11:07 -0500 Received: from hera.aquilenet.fr ([185.233.100.1]:43114) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1edzI1-0006Cw-M3 for 30116@debbugs.gnu.org; Tue, 23 Jan 2018 09:11:06 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 415C110671; Tue, 23 Jan 2018 15:11:05 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XPKtHDoaiABS; Tue, 23 Jan 2018 15:11:04 +0100 (CET) Received: from ribbon (unknown [193.50.110.135]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 73E2E10610; Tue, 23 Jan 2018 15:11:04 +0100 (CET) From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> <87607vu9dp.fsf@gmail.com> <877esb84ae.fsf@netris.org> <87o9lmp3c2.fsf@gnu.org> <87k1w9ryhz.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 4 =?utf-8?Q?Pluvi=C3=B4se?= an 226 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 23 Jan 2018 15:11:04 +0100 In-Reply-To: <87k1w9ryhz.fsf@gmail.com> (Maxim Cournoyer's message of "Mon, 22 Jan 2018 23:27:04 -0500") Message-ID: <87o9lk64xz.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 30116 Cc: Mark H Weaver , 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) Maxim Cournoyer skribis: > In the `patch-el-files' phase of the emacs-build-system, we find the > following snippet: > > (with-directory-excursion el-dir > ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still enco= ded > ;; with the "ISO-8859-1" locale. > (unless (false-if-exception (substitute-cmd)) > (with-fluids ((%default-port-encoding "ISO-8859-1")) > (substitute-cmd)))) > > In case an exception is returned while processing the file, it is > retried being opened with the "ISO-8859-1" encoding. Or, this resolves > to a call to `open-file', which documentation says: > > =E2=80=98b=E2=80=99 > Use binary mode, ensuring that each byte in the file will be > read as one Scheme character. > > To provide this property, the file will be opened with the > 8-bit character encoding "ISO-8859-1", ignoring the default > port encoding. *Note Ports::, for more information on port > encodings. > > So, by opening an file whose encoding is unknown as a ISO-8859-1 file, > we are doing the same as if we had passed the 'binary option. Could this > explain why we end up with NUL characters where we were expecting text? That could be the reason. Guile provides a way to honor Emacs-style =E2=80=98encoding=E2=80=99 declarations, and =E2=80=98call-with-input-file= =E2=80=99 does that if we pass #:guess-encoding #t (info "(guile) Character Encoding of Source Files"). Did the faulty file have such a declaration? Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Thu Jan 25 00:11:37 2018 Received: (at 30116) by debbugs.gnu.org; 25 Jan 2018 05:11:37 +0000 Received: from localhost ([127.0.0.1]:41620 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eeZp3-00022T-13 for submit@debbugs.gnu.org; Thu, 25 Jan 2018 00:11:37 -0500 Received: from mail-it0-f47.google.com ([209.85.214.47]:45609) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eeZp0-00022F-CI for 30116@debbugs.gnu.org; Thu, 25 Jan 2018 00:11:34 -0500 Received: by mail-it0-f47.google.com with SMTP id x42so7821033ita.4 for <30116@debbugs.gnu.org>; Wed, 24 Jan 2018 21:11:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=QKds6zAUDB/rjaS2r0yWS+yEqrh76V5pyJVyU+Mw6Ig=; b=E74DAvdeGNwqdeKBXvG/ZdDFLdMHqu9kEVqCfplisPkLlofJu86Oz6JeG+QWmXGGmi phlEf7ZR4qofPdrkN23qMmuBnYNTS1GjI2VOsPcJ76Ppe7eDiKutUJ3CaSeNxDWllayk dgSTh9NB2Oa5AHIZrIXdQ5VYiznyxlIo9fA+9QQj0gI6apIdoWW4pSs6XpqDCUKr2Qlg wYonqWah2dmu++tC9UGZnZL+XdnHicRZ8lZwYLgAoSq8HGR6Cw2Fj98v3k1YU27AptLt 4NvxKMLKRwYrkstgMndXsmIvRSyh3MWF0eexQw59vCDma7j0HncbyT7ezubVuyd+HPtk mdGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=QKds6zAUDB/rjaS2r0yWS+yEqrh76V5pyJVyU+Mw6Ig=; b=pGZCmfACr7ROACD0Sf9+/FpsR6HSkACdJtRYuBaZO/wgmI6JZcRSg6dygBkrFi+br4 2S5QU1cHD0kfINVY6zW60uNxiAv1a9ebIE7zgEvcgfSLfP38J5eJ0agb2pLwwJ0kxhb2 2iqmXq9lY24ena3a/Uhg4qT9pw8hEPFoM00kcOb2UyP++q0ZR76oif/J5ok2mseLKtaF JEgQ8CJY68W4xtnREozayj7vi/RT2fCNCeLJx3rNas5RGaYALox6nEBnr/EhCTDYdk5S dEX/oLD/b00KK0Qqb2EB6xjByeTCvXU+w/eXuVZJpusegl2KiybMJXyMX2kh+dRq7b/T ugtw== X-Gm-Message-State: AKwxytdBFp2+ZHIztEGM/3JhsHVYHLlqvmNzk0s/obakZSCvaS/2x76b Izh0WsQF1FLHuQSUqPkwbEIGHg== X-Google-Smtp-Source: AH8x226LYlOc8Y0KXrJZbJe8Z0vxjga+IB2800aOR8Kn5shCPneG84SXsELsUuUpvg2xd6Ygmc/x7Q== X-Received: by 10.36.19.5 with SMTP id 5mr11570072itz.38.1516857088774; Wed, 24 Jan 2018 21:11:28 -0800 (PST) Received: from apteryx ([45.72.232.234]) by smtp.gmail.com with ESMTPSA id l63sm232017ita.44.2018.01.24.21.11.27 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Jan 2018 21:11:28 -0800 (PST) From: Maxim Cournoyer To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> <87607vu9dp.fsf@gmail.com> <877esb84ae.fsf@netris.org> <87o9lmp3c2.fsf@gnu.org> <87k1w9ryhz.fsf@gmail.com> <87o9lk64xz.fsf@gnu.org> Date: Thu, 25 Jan 2018 00:11:26 -0500 In-Reply-To: <87o9lk64xz.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Tue, 23 Jan 2018 15:11:04 +0100") Message-ID: <87po5y8qv5.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 30116 Cc: Mark H Weaver , 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Maxim Cournoyer skribis: > >> In the `patch-el-files' phase of the emacs-build-system, we find the >> following snippet: >> >> (with-directory-excursion el-dir >> ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still enc= oded >> ;; with the "ISO-8859-1" locale. >> (unless (false-if-exception (substitute-cmd)) >> (with-fluids ((%default-port-encoding "ISO-8859-1")) >> (substitute-cmd)))) >> >> In case an exception is returned while processing the file, it is >> retried being opened with the "ISO-8859-1" encoding. Or, this resolves >> to a call to `open-file', which documentation says: >> >> =E2=80=98b=E2=80=99 >> Use binary mode, ensuring that each byte in the file will be >> read as one Scheme character. >> >> To provide this property, the file will be opened with the >> 8-bit character encoding "ISO-8859-1", ignoring the default >> port encoding. *Note Ports::, for more information on port >> encodings. >> >> So, by opening an file whose encoding is unknown as a ISO-8859-1 file, >> we are doing the same as if we had passed the 'binary option. Could this >> explain why we end up with NUL characters where we were expecting text? > > That could be the reason. Guile provides a way to honor Emacs-style > =E2=80=98encoding=E2=80=99 declarations, and =E2=80=98call-with-input-fil= e=E2=80=99 does that if we pass > #:guess-encoding #t (info "(guile) Character Encoding of Source Files"). > > Did the faulty file have such a declaration? Sadly, it doesn't. Although even if it did, I don't think it would be very robust to expect every misbehaving files we might encounter to include one! So I think we should apply my v2 patch to core-updates for now (see my previous reply on this thread), until we have our substitute routine implemented using srfi-115! Thanks, Maxim From debbugs-submit-bounces@debbugs.gnu.org Thu Jan 25 06:11:35 2018 Received: (at 30116) by debbugs.gnu.org; 25 Jan 2018 11:11:35 +0000 Received: from localhost ([127.0.0.1]:41776 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eefRO-0002Zx-Pj for submit@debbugs.gnu.org; Thu, 25 Jan 2018 06:11:35 -0500 Received: from hera.aquilenet.fr ([185.233.100.1]:54304) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eefRN-0002Zo-6w for 30116@debbugs.gnu.org; Thu, 25 Jan 2018 06:11:33 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 4596C10ACC; Thu, 25 Jan 2018 12:11:32 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id avSMNxBRyHHr; Thu, 25 Jan 2018 12:11:31 +0100 (CET) Received: from ribbon (unknown [193.50.110.182]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 4AEB81093C; Thu, 25 Jan 2018 12:11:31 +0100 (CET) From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> <87o9lu6o9m.fsf@gnu.org> <87607vu9dp.fsf@gmail.com> <877esb84ae.fsf@netris.org> <87o9lmp3c2.fsf@gnu.org> <87k1w9ryhz.fsf@gmail.com> <87o9lk64xz.fsf@gnu.org> <87po5y8qv5.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 6 =?utf-8?Q?Pluvi=C3=B4se?= an 226 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 25 Jan 2018 12:11:30 +0100 In-Reply-To: <87po5y8qv5.fsf@gmail.com> (Maxim Cournoyer's message of "Thu, 25 Jan 2018 00:11:26 -0500") Message-ID: <87shauxkf1.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 30116 Cc: Mark H Weaver , 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) Maxim Cournoyer skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: > >> Maxim Cournoyer skribis: >> >>> In the `patch-el-files' phase of the emacs-build-system, we find the >>> following snippet: >>> >>> (with-directory-excursion el-dir >>> ;; Some old '.el' files (e.g., tex-buf.el in AUCTeX) are still en= coded >>> ;; with the "ISO-8859-1" locale. >>> (unless (false-if-exception (substitute-cmd)) >>> (with-fluids ((%default-port-encoding "ISO-8859-1")) >>> (substitute-cmd)))) >>> >>> In case an exception is returned while processing the file, it is >>> retried being opened with the "ISO-8859-1" encoding. Or, this resolves >>> to a call to `open-file', which documentation says: >>> >>> =E2=80=98b=E2=80=99 >>> Use binary mode, ensuring that each byte in the file will be >>> read as one Scheme character. >>> >>> To provide this property, the file will be opened with the >>> 8-bit character encoding "ISO-8859-1", ignoring the default >>> port encoding. *Note Ports::, for more information on port >>> encodings. >>> >>> So, by opening an file whose encoding is unknown as a ISO-8859-1 file, >>> we are doing the same as if we had passed the 'binary option. Could this >>> explain why we end up with NUL characters where we were expecting text? >> >> That could be the reason. Guile provides a way to honor Emacs-style >> =E2=80=98encoding=E2=80=99 declarations, and =E2=80=98call-with-input-fi= le=E2=80=99 does that if we pass >> #:guess-encoding #t (info "(guile) Character Encoding of Source Files"). >> >> Did the faulty file have such a declaration? > > Sadly, it doesn't. Although even if it did, I don't think it would be > very robust to expect every misbehaving files we might encounter to > include one! Sure, I was asking just because it=E2=80=99s an Emacs-related package. > So I think we should apply my v2 patch to core-updates for now (see my > previous reply on this thread), until we have our substitute routine > implemented using srfi-115! Sounds good! Note that I=E2=80=99ll wait until after the current =E2=80=98= core-updates=E2=80=99 has been merged. Please do ping me if you think I=E2=80=99ve forgotten! Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Jun 13 21:40:57 2018 Received: (at 30116) by debbugs.gnu.org; 14 Jun 2018 01:40:57 +0000 Received: from localhost ([127.0.0.1]:47121 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fTHFx-0000TE-8i for submit@debbugs.gnu.org; Wed, 13 Jun 2018 21:40:57 -0400 Received: from mail-it0-f41.google.com ([209.85.214.41]:35088) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fTHFw-0000T1-1w for 30116@debbugs.gnu.org; Wed, 13 Jun 2018 21:40:56 -0400 Received: by mail-it0-f41.google.com with SMTP id a3-v6so6599966itd.0 for <30116@debbugs.gnu.org>; Wed, 13 Jun 2018 18:40:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=+4FoRwWJLqnigzYSXd/vnWE4k3qMuQY2/ir584YoUII=; b=rLCyo5KxPwlABwovCooBYX7QbxtZEpITaAyUHPdmCax4nCOzbmqKDbVyXYzlZO/KZj ET56yT5ueYKeXrHtdjnw8gcR6A/dVMJTHiUAXw1m66nC31G5VWAJ12sXFa/cLtZf674D Fs78o/mHL1PuuMB7sJ4zYTSOnBrC4ibtezOqD1TJsjQ1+Or5MloswctSRkDa4yTKiCby wvb0BqwW/XDh1ajwOlpU/P8U+EI7ULW243NvH9KMlOoXK54/oc9vikZuIKcizcU2p0Mf JwKKMuTQk4NYPXUCZEpAhcTZiT0inWrscMcq5MXSIw+zn/lwT4AZPvlCKEUZKwxdfLHu bt+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=+4FoRwWJLqnigzYSXd/vnWE4k3qMuQY2/ir584YoUII=; b=hL9jMS3GqDJr2VKwTSblc4jYW9EBI0QtRybc3U58zpufva8hKvNwcchMWrdE1AD57e aAZLlOc1LYekDrg8s17UmeZuKLKJSMf21pggJ4hil5ztmkRkv6H+1REPx1MP2cY3/U5J Rwkz60qjjsQLbw+Wh9RAONYf4FI7/E+El83Qai/TDBzmhkZubdv4tC/Tyol41KcDK9wo IrsnXA/Wsh/FCD6u0jgo74jsgsvDYBXt7G6EF1cM5zfeoW88ovA3ckAbem3H+ZYyRxOq dnCo55Uz5OKfnmZxQjqAGa3v4ozQsWrFaVmyG1xxxc95G4ATQKyqUii/tblWIbbGp/I9 Syrg== X-Gm-Message-State: APt69E0L441XEBV3KiibOPr6fo7JTmPF2zNB/YM1HmQzqwX6jacghXCU ZMYkpecCAPZS1U5VQUjd0Y91Lg== X-Google-Smtp-Source: ADUXVKLt869qH/mPqvZ88j2IFNPCd1l/iBMX5xUx3WpRCf8p6vra8MHyHd4uZgtBdyxC2U98pv+8MA== X-Received: by 2002:a24:cc43:: with SMTP id x64-v6mr468254itf.9.1528940450392; Wed, 13 Jun 2018 18:40:50 -0700 (PDT) Received: from apteryx ([45.72.197.2]) by smtp.gmail.com with ESMTPSA id j12-v6sm2242594itb.26.2018.06.13.18.40.49 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 13 Jun 2018 18:40:50 -0700 (PDT) From: Maxim Cournoyer To: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> <87h8rk4kl9.fsf@gnu.org> Date: Wed, 13 Jun 2018 21:40:49 -0400 In-Reply-To: <87h8rk4kl9.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Wed, 17 Jan 2018 15:37:54 +0100") Message-ID: <877en2taam.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30116 Cc: 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) ludo@gnu.org (Ludovic Court=C3=A8s) writes: > Maxim Cournoyer skribis: > >> From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001 >> From: Maxim Cournoyer >> Date: Sun, 14 Jan 2018 20:31:33 -0500 >> Subject: [PATCH] utils: Prevent substitute from crashing on files contai= ning >> NUL chars. >> >> Fixes issue #30116. >> >> * guix/build/utils.scm (substitute): Add condition to skip lines contain= ing >> the NUL character. > > [...] > >> + ((string-contains line (make-string 1 #\nul)) > > Rather (string-index line #\nul). > >> + ;; The regexp functions of the GNU C library (which Guile u= ses) >> + ;; cannot deal with NUL characters, so skip to the next lin= e. >> + (format #t "skipping line with NUL characters: ~s\n" line) >> + (loop (read-line in 'concat))) > > Rather (format (current-error-port) =E2=80=A6). > > It=E2=80=99s strange semantics, but it=E2=80=99s probably better than cra= shing in the > contexts where we use it. > > Otherwise LGTM. This would have to go to the next =E2=80=98core-updates= =E2=80=99 (or > =E2=80=98core-updates-next=E2=80=99 in the meantime.) > > Thanks! > > Ludo=E2=80=99. Ping. Is it the right time to merge this? Maxim From debbugs-submit-bounces@debbugs.gnu.org Thu Jun 14 03:04:14 2018 Received: (at 30116) by debbugs.gnu.org; 14 Jun 2018 07:04:14 +0000 Received: from localhost ([127.0.0.1]:47267 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fTMIo-0005RU-9D for submit@debbugs.gnu.org; Thu, 14 Jun 2018 03:04:14 -0400 Received: from world.peace.net ([64.112.178.59]:50798) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fTMIm-0005RF-JZ for 30116@debbugs.gnu.org; Thu, 14 Jun 2018 03:04:13 -0400 Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fTMIf-0002hi-Il; Thu, 14 Jun 2018 03:04:05 -0400 From: Mark H Weaver To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> Date: Thu, 14 Jun 2018 03:02:47 -0400 In-Reply-To: <87k1wjc35d.fsf_-_@gmail.com> (Maxim Cournoyer's message of "Sun, 14 Jan 2018 20:38:22 -0500") Message-ID: <87r2l9q294.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30116 Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= , 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Maxim, Thanks for working on this. I found a problem with this patch, and I also have a suggestion. Please see below. Maxim Cournoyer writes: > From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001 > From: Maxim Cournoyer > Date: Sun, 14 Jan 2018 20:31:33 -0500 > Subject: [PATCH] utils: Prevent substitute from crashing on files contain= ing > NUL chars. > > Fixes issue #30116. > > * guix/build/utils.scm (substitute): Add condition to skip lines containi= ng > the NUL character. > --- > guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------ > 1 file changed, 26 insertions(+), 18 deletions(-) > > diff --git a/guix/build/utils.scm b/guix/build/utils.scm > index 7391307c8..975f4e70a 100644 > --- a/guix/build/utils.scm > +++ b/guix/build/utils.scm > @@ -3,6 +3,7 @@ > ;;; Copyright =C2=A9 2013 Andreas Enge > ;;; Copyright =C2=A9 2013 Nikita Karetnikov > ;;; Copyright =C2=A9 2015 Mark H Weaver > +;;; Copyright =C2=A9 2018 Maxim Cournoyer > ;;; > ;;; This file is part of GNU Guix. > ;;; > @@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the l= ine that will be written as > a substitution of the original line. Be careful about using '$' to matc= h the > end of a line; by itself it won't match the terminating newline of a lin= e." > (let ((rx+proc (map (match-lambda > - (((? regexp? pattern) . proc) > - (cons pattern proc)) > - ((pattern . proc) > - (cons (make-regexp pattern regexp/extended) > - proc))) > + (((? regexp? pattern) . proc) > + (cons pattern proc)) > + ((pattern . proc) > + (cons (make-regexp pattern regexp/extended) > + proc))) > pattern+procs))) > (with-atomic-file-replacement file > (lambda (in out) > (let loop ((line (read-line in 'concat))) > - (if (eof-object? line) > - #t > - (let ((line (fold (lambda (r+p line) > - (match r+p > - ((regexp . proc) > - (match (list-matches regexp line) > - ((and m+ (_ _ ...)) > - (proc line m+)) > - (_ line))))) > - line > - rx+proc))) > - (display line out) > - (loop (read-line in 'concat))))))))) > + (cond > + ((eof-object? line) > + #t) > + ((string-contains line (make-string 1 #\nul)) > + ;; The regexp functions of the GNU C library (which Guile us= es) > + ;; cannot deal with NUL characters, so skip to the next line. > + (format #t "skipping line with NUL characters: ~s\n" line) > + (loop (read-line in 'concat))) This code will unconditionally *delete* all lines that contain NULs. This will happen because the lines with NULs are not being written to the output file, which will replace the original file when this loop reaches EOF. So, any lines that are not copied to the output will be lost. To preserve the lines with NULs, you should call (display line out) before calling 'loop'. Also, please use (string-index line #\nul) to check for NULs instead of 'string-contains'. It should be more efficient. > + (else > + (let ((line (fold (lambda (r+p line) > + (match r+p > + ((regexp . proc) > + (match (list-matches regexp line) > + ((and m+ (_ _ ...)) > + (proc line m+)) > + (_ line))))) > + line > + rx+proc))) > + (display line out) > + (loop (read-line in 'concat)))))))))) With the changes suggested above, I would have no objection to pushing this to core-updates. However, it occurs to me that we could handle the NUL case in a better way: Since the C regex functions that we use cannot handle NUL bytes, we could use a different code point to represent NUL during those operations. We could choose a code point from one of the Unicode Private Use Areas that does not occur in the string. Let NUL* be the code point which will represent NUL bytes. First replace all NULs with NUL*s, then perform the substitutions, and finally replace all ALT*s with NULs before writing to the output. As an important optimization, we should avoid performing these extra operations unless (string-index line #\nul) finds a NUL. We could then perform these extra substitutions with simple, inefficient code. Maybe something like this (untested): --8<---------------cut here---------------start------------->8--- (with-atomic-file-replacement file (lambda (in out) (let loop ((line (read-line in 'concat))) (if (eof-object? line) #t (let* ((nul* (or (and (string-index line #\nul) (unused-private-use-code-point line)) #\nul)) (line* (replace-char #\nul nul* line)) (line1* (fold (lambda (r+p line) (match r+p ((regexp . proc) (match (list-matches regexp line) ((and m+ (_ _ ...)) (proc line m+)) (_ line))))) line* rx+proc)) (line1 (replace-char nul* #\nul line1*))) (display line1 out) (loop (read-line in 'concat))))))))) --8<---------------cut here---------------end--------------->8--- Where the following additional private procedures would be added to (guix build utils) above the definition for 'substitute': --8<---------------cut here---------------start------------->8--- (define (unused-private-use-code-point s) "Find a code point within a Unicode Private Use Area that is not present in S, and return the corresponding character object. If one cannot be found, return false." (define (scan lo hi) (and (<=3D lo hi) (let ((c (integer->char lo))) (if (string-index s c) (scan (+ lo 1) hi) c)))) (or (scan #xE000 #xF8FF) (scan #xF0000 #xFFFFD) (scan #x100000 #x10FFFD))) (define (replace-char c1 c2 s) "Return a string which is equal to S except with all instances of C1 replaced by C2. If C1 and C2 are equal, return S." (if (char=3D? c1 c2) s (string-map (lambda (c) (if (char=3D? c c1) c2 c)) s))) --8<---------------cut here---------------end--------------->8--- What do you think? Mark From debbugs-submit-bounces@debbugs.gnu.org Thu Jun 14 04:01:25 2018 Received: (at 30116) by debbugs.gnu.org; 14 Jun 2018 08:01:25 +0000 Received: from localhost ([127.0.0.1]:47283 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fTNC8-0006jD-PI for submit@debbugs.gnu.org; Thu, 14 Jun 2018 04:01:24 -0400 Received: from eggs.gnu.org ([208.118.235.92]:41939) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fTNC7-0006j0-6J for 30116@debbugs.gnu.org; Thu, 14 Jun 2018 04:01:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fTNBw-0008VB-Fe for 30116@debbugs.gnu.org; Thu, 14 Jun 2018 04:01:18 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39206) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fTNBw-0008V5-BK; Thu, 14 Jun 2018 04:01:12 -0400 Received: from [193.50.110.75] (port=50004 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fTNBv-0001ZH-Ok; Thu, 14 Jun 2018 04:01:12 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> <87h8rk4kl9.fsf@gnu.org> <877en2taam.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 26 Prairial an 226 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 14 Jun 2018 10:01:09 +0200 In-Reply-To: <877en2taam.fsf@gmail.com> (Maxim Cournoyer's message of "Wed, 13 Jun 2018 21:40:49 -0400") Message-ID: <87r2l923wa.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 30116 Cc: 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Hello Maxim, Maxim Cournoyer skribis: > ludo@gnu.org (Ludovic Court=C3=A8s) writes: [...] >> Otherwise LGTM. This would have to go to the next =E2=80=98core-updates= =E2=80=99 (or >> =E2=80=98core-updates-next=E2=80=99 in the meantime.) >> >> Thanks! >> >> Ludo=E2=80=99. > > Ping. Is it the right time to merge this? Yes you can push it to =E2=80=98core-updates=E2=80=99 now. Thank you! Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Thu Jun 14 04:02:43 2018 Received: (at 30116) by debbugs.gnu.org; 14 Jun 2018 08:02:44 +0000 Received: from localhost ([127.0.0.1]:47287 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fTNDP-0006l5-2s for submit@debbugs.gnu.org; Thu, 14 Jun 2018 04:02:43 -0400 Received: from eggs.gnu.org ([208.118.235.92]:42235) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fTNDM-0006kr-VA for 30116@debbugs.gnu.org; Thu, 14 Jun 2018 04:02:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fTND8-0000LF-7C for 30116@debbugs.gnu.org; Thu, 14 Jun 2018 04:02:35 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:39222) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fTND8-0000L9-3O; Thu, 14 Jun 2018 04:02:26 -0400 Received: from [193.50.110.75] (port=50006 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1fTND7-00034f-He; Thu, 14 Jun 2018 04:02:25 -0400 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) To: Mark H Weaver Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> <87r2l9q294.fsf@netris.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 26 Prairial an 226 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 14 Jun 2018 10:02:23 +0200 In-Reply-To: <87r2l9q294.fsf@netris.org> (Mark H. Weaver's message of "Thu, 14 Jun 2018 03:02:47 -0400") Message-ID: <87muvx23u8.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 30116 Cc: 30116@debbugs.gnu.org, Maxim Cournoyer X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.0 (------) Mark H Weaver skribis: > Thanks for working on this. I found a problem with this patch, > and I also have a suggestion. Please see below. I hadn=E2=80=99t seen Mark=E2=80=99s reply, which raises valid concerns. P= lease dismiss the message I just sent, Maxim. Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Sat Jun 16 12:47:13 2018 Received: (at 30116) by debbugs.gnu.org; 16 Jun 2018 16:47:13 +0000 Received: from localhost ([127.0.0.1]:51942 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fUEM2-0006iI-AL for submit@debbugs.gnu.org; Sat, 16 Jun 2018 12:47:13 -0400 Received: from mail-io0-f195.google.com ([209.85.223.195]:34994) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fUEM0-0006i7-V1 for 30116@debbugs.gnu.org; Sat, 16 Jun 2018 12:47:09 -0400 Received: by mail-io0-f195.google.com with SMTP id u4-v6so13382176iof.2 for <30116@debbugs.gnu.org>; Sat, 16 Jun 2018 09:47:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=O6T6Zzyal5l7o2q3ZU+Ou582CgE2phI4H+3Q+F4ntwU=; b=p+jIylu3+Ji2Pkoi7ugotkTojCr9yjTQFVLB2kux/SX0t/OmwwasB4QRphk8+L+yCf WQpHhVkYAPHYZFvAXFmlBY523dL54U66EM/X4uqlJkgea53Tsa+LsWDkhHu2lTC2AT0a hXf1Gb57diCNCiM19xM1RgsdtrxxEFDP3GtSW5GzUmjQUBp8lvWFXZZjq0yLKYSCLN4a hzk64HRkWPWemu16FVxCeoIwbldToh9JKzRh/HGuqwjDyzYTWt6J4E+yGJHgCe0luzlw vBEvasFNKiIi+uARmnPbhrf1lck4CDb4EgLRyfqNKV4ntFCXXcFqvc57qw6vrh1b2zBS 4Obg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=O6T6Zzyal5l7o2q3ZU+Ou582CgE2phI4H+3Q+F4ntwU=; b=FQCUwaQ5LvlU5fLoKkhD5fl63olxolxiNNQzAh7UEMxadPcgIGKVqn7g7/ctXSugyW WfTnYEsFLj3g61YCxSiB60QCbwKerPZcBlrgTBy9hjV9ZxwUdJYMpeRDf9LZJoIMgHuY bxr5NZBIAjlcuWuQRp36ypMPxbqmtQgBb7Ie4hmvjL3FojqlDO0ha04nxKSNp15utkzk uEGwx2aOTVMcggS9fMclD6djM84Njp2Pm2FdxeHqD6ydEolyfBsRvQ1hdyeya0u8wlAl FIsWMg1IlkuOnf4qXy1VJf0Jy7mmXg7xyxRws0k1wi2lYovFP1MrXAn2Eui82sUCzAsI /Dhw== X-Gm-Message-State: APt69E0zuXkko8SKUz+mf2Pr50RqTGm2iCkpxtj/JZUFjdVV0hGmw981 TeWDVnWdmeOGrBUOI9H41O6xX7I0 X-Google-Smtp-Source: ADUXVKIDBrmNpIMDhY1DdRak3e4/PFDk58GaE/c2VUpuFgOEDh//S0EuyDlOzr0W8/Zkzh8uyUvs4Q== X-Received: by 2002:a6b:a015:: with SMTP id j21-v6mr5144700ioe.25.1529167623260; Sat, 16 Jun 2018 09:47:03 -0700 (PDT) Received: from apteryx ([45.72.197.2]) by smtp.gmail.com with ESMTPSA id r12-v6sm2998312iob.35.2018.06.16.09.47.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 16 Jun 2018 09:47:02 -0700 (PDT) From: Maxim Cournoyer To: Mark H Weaver Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> <87r2l9q294.fsf@netris.org> Date: Sat, 16 Jun 2018 12:47:01 -0400 In-Reply-To: <87r2l9q294.fsf@netris.org> (Mark H. Weaver's message of "Thu, 14 Jun 2018 03:02:47 -0400") Message-ID: <87tvq2smpm.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30116 Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= , 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Mark, Mark H Weaver writes: > Hi Maxim, > > Thanks for working on this. I found a problem with this patch, > and I also have a suggestion. Please see below. > > Maxim Cournoyer writes: > >> From 9891e428eae0ed24e0d61862b3f5e298606b79eb Mon Sep 17 00:00:00 2001 >> From: Maxim Cournoyer >> Date: Sun, 14 Jan 2018 20:31:33 -0500 >> Subject: [PATCH] utils: Prevent substitute from crashing on files contai= ning >> NUL chars. >> >> Fixes issue #30116. >> >> * guix/build/utils.scm (substitute): Add condition to skip lines contain= ing >> the NUL character. >> --- >> guix/build/utils.scm | 44 ++++++++++++++++++++++++++------------------ >> 1 file changed, 26 insertions(+), 18 deletions(-) >> >> diff --git a/guix/build/utils.scm b/guix/build/utils.scm >> index 7391307c8..975f4e70a 100644 >> --- a/guix/build/utils.scm >> +++ b/guix/build/utils.scm >> @@ -3,6 +3,7 @@ >> ;;; Copyright =C2=A9 2013 Andreas Enge >> ;;; Copyright =C2=A9 2013 Nikita Karetnikov >> ;;; Copyright =C2=A9 2015 Mark H Weaver >> +;;; Copyright =C2=A9 2018 Maxim Cournoyer >> ;;; >> ;;; This file is part of GNU Guix. >> ;;; >> @@ -621,28 +622,35 @@ PROC as (PROC LINE MATCHES); PROC must return the = line that will be written as >> a substitution of the original line. Be careful about using '$' to mat= ch the >> end of a line; by itself it won't match the terminating newline of a li= ne." >> (let ((rx+proc (map (match-lambda >> - (((? regexp? pattern) . proc) >> - (cons pattern proc)) >> - ((pattern . proc) >> - (cons (make-regexp pattern regexp/extended) >> - proc))) >> + (((? regexp? pattern) . proc) >> + (cons pattern proc)) >> + ((pattern . proc) >> + (cons (make-regexp pattern regexp/extended) >> + proc))) >> pattern+procs))) >> (with-atomic-file-replacement file >> (lambda (in out) >> (let loop ((line (read-line in 'concat))) >> - (if (eof-object? line) >> - #t >> - (let ((line (fold (lambda (r+p line) >> - (match r+p >> - ((regexp . proc) >> - (match (list-matches regexp line) >> - ((and m+ (_ _ ...)) >> - (proc line m+)) >> - (_ line))))) >> - line >> - rx+proc))) >> - (display line out) >> - (loop (read-line in 'concat))))))))) >> + (cond >> + ((eof-object? line) >> + #t) >> + ((string-contains line (make-string 1 #\nul)) >> + ;; The regexp functions of the GNU C library (which Guile u= ses) >> + ;; cannot deal with NUL characters, so skip to the next lin= e. >> + (format #t "skipping line with NUL characters: ~s\n" line) >> + (loop (read-line in 'concat))) > > This code will unconditionally *delete* all lines that contain NULs. > > This will happen because the lines with NULs are not being written to > the output file, which will replace the original file when this loop > reaches EOF. So, any lines that are not copied to the output will be > lost. > > To preserve the lines with NULs, you should call (display line out) > before calling 'loop'. Good observation! I agree that we should keep limit the effect of ignoring NULs only to the substitution. > Also, please use (string-index line #\nul) to check for NULs instead of > 'string-contains'. It should be more efficient. OK! > >> + (else >> + (let ((line (fold (lambda (r+p line) >> + (match r+p >> + ((regexp . proc) >> + (match (list-matches regexp line) >> + ((and m+ (_ _ ...)) >> + (proc line m+)) >> + (_ line))))) >> + line >> + rx+proc))) >> + (display line out) >> + (loop (read-line in 'concat)))))))))) > > With the changes suggested above, I would have no objection to pushing > this to core-updates. However, it occurs to me that we could handle the > NUL case in a better way: > > Since the C regex functions that we use cannot handle NUL bytes, we > could use a different code point to represent NUL during those > operations. We could choose a code point from one of the Unicode > Private Use Areas that > does not occur in the string. > > Let NUL* be the code point which will represent NUL bytes. First > replace all NULs with NUL*s, then perform the substitutions, and finally > replace all ALT*s with NULs before writing to the output. Do I understand this transformation as NULs -> NUL*s and back from NUL*s -> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation. > As an important optimization, we should avoid performing these extra > operations unless (string-index line #\nul) finds a NUL. OK. > We could then perform these extra substitutions with simple, inefficient > code. Maybe something like this (untested): > > (with-atomic-file-replacement file > (lambda (in out) > (let loop ((line (read-line in 'concat))) > (if (eof-object? line) > #t > (let* ((nul* (or (and (string-index line #\nul) > (unused-private-use-code-point line)) > #\nul)) > (line* (replace-char #\nul nul* line)) > (line1* (fold (lambda (r+p line) > (match r+p > ((regexp . proc) > (match (list-matches regexp line) > ((and m+ (_ _ ...)) > (proc line m+)) > (_ line))))) > line* > rx+proc)) > (line1 (replace-char nul* #\nul line1*))) > (display line1 out) > (loop (read-line in 'concat))))))))) > > > Where the following additional private procedures would be added to > (guix build utils) above the definition for 'substitute': > > (define (unused-private-use-code-point s) > "Find a code point within a Unicode Private Use Area that is not > present in S, and return the corresponding character object. If one > cannot be found, return false." > (define (scan lo hi) > (and (<=3D lo hi) > (let ((c (integer->char lo))) > (if (string-index s c) > (scan (+ lo 1) hi) > c)))) > (or (scan #xE000 #xF8FF) > (scan #xF0000 #xFFFFD) > (scan #x100000 #x10FFFD))) > > (define (replace-char c1 c2 s) > "Return a string which is equal to S except with all instances of C1 > replaced by C2. If C1 and C2 are equal, return S." > (if (char=3D? c1 c2) > s > (string-map (lambda (c) > (if (char=3D? c c1) > c2 > c)) > s))) > > What do you think? It raises the complexity level a bit for something which doesn't seem to be a very common scenario, but otherwise seems a very elegant workaround. It seems to me that your implementation is already pretty complete. I'll try write a test for validating it and report back. Thank you for sharing your ideas! Maxim From debbugs-submit-bounces@debbugs.gnu.org Sun Jun 17 00:37:27 2018 Received: (at 30116) by debbugs.gnu.org; 17 Jun 2018 04:37:27 +0000 Received: from localhost ([127.0.0.1]:52451 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fUPRO-0007ND-LE for submit@debbugs.gnu.org; Sun, 17 Jun 2018 00:37:26 -0400 Received: from world.peace.net ([64.112.178.59]:58532) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fUPRN-0007Mz-6u for 30116@debbugs.gnu.org; Sun, 17 Jun 2018 00:37:25 -0400 Received: from mhw by world.peace.net with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1fUPRG-0002kY-Lb; Sun, 17 Jun 2018 00:37:18 -0400 From: Mark H Weaver To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates)) References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> <87r2l9q294.fsf@netris.org> <87tvq2smpm.fsf@gmail.com> Date: Sun, 17 Jun 2018 00:36:00 -0400 In-Reply-To: <87tvq2smpm.fsf@gmail.com> (Maxim Cournoyer's message of "Sat, 16 Jun 2018 12:47:01 -0400") Message-ID: <87d0wq3u8f.fsf@netris.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30116 Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= , 30116@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Maxim, Maxim Cournoyer writes: > Mark H Weaver writes: > >> With the changes suggested above, I would have no objection to pushing >> this to core-updates. However, it occurs to me that we could handle the >> NUL case in a better way: >> >> Since the C regex functions that we use cannot handle NUL bytes, we >> could use a different code point to represent NUL during those >> operations. We could choose a code point from one of the Unicode >> Private Use Areas that >> does not occur in the string. >> >> Let NUL* be the code point which will represent NUL bytes. First >> replace all NULs with NUL*s, then perform the substitutions, and finally >> replace all ALT*s with NULs before writing to the output. > > Do I understand this transformation as NULs -> NUL*s and back from NUL*s > -> NULs correctly? I'm not sure how NUL*s became ALT*s in your explanation. Sorry, it's a typo. Where I wrote "ALT*s", I meant to write "NUL*s". >> What do you think? > > It raises the complexity level a bit for something which doesn't seem to > be a very common scenario, FWIW, I agree that it's not a common scenario, and it's not entirely clear that it was worth the time I spent on it, or the added complexity. On the other hand, I would dislike having a basic API like 'substitute*' be subtly broken in this way. > but otherwise seems a very elegant > workaround. It seems to me that your implementation is already pretty > complete. I'll try write a test for validating it and report back. Sounds good. Thank you! Mark From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 08 14:15:06 2021 Received: (at 30116-done) by debbugs.gnu.org; 8 Jan 2021 19:15:06 +0000 Received: from localhost ([127.0.0.1]:50718 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kxxE1-0003lp-Ok for submit@debbugs.gnu.org; Fri, 08 Jan 2021 14:15:06 -0500 Received: from mail-qt1-f174.google.com ([209.85.160.174]:43360) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kxxE0-0003kv-44 for 30116-done@debbugs.gnu.org; Fri, 08 Jan 2021 14:15:04 -0500 Received: by mail-qt1-f174.google.com with SMTP id 2so7282164qtt.10 for <30116-done@debbugs.gnu.org>; Fri, 08 Jan 2021 11:15:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version; bh=BXJnqTFpS6EMwn3PyMm/N+jW9oyv7OcQALm/P0KaU7w=; b=Gkd0XtVU6jC7pIt47ukUHfd5jbHElqUNsW4L6WbbiwTNqKEBKFAT+4Qz4sC+Zb0pUr 2NiipvzsXeRjMdS1UxfjLGieneYoVwPrxfdDJoZ/S91e0+fd7zXX7brbE3JwaTRyqdPL TdelogybjEEYIUCqFZOfpmWZqkhU7uy900UMT0SliHZDy+kgPH9WiHJMRGIBltKnDd8r XchwZhYUt+IrHAy5VVcXrh1e8vMYxWf2FdffNZKrRnX7el10GdVryV84TS9oqeUzZ9ks fWcPs2DVNRa4SJ7fOJqjMeITCjjcTjAPzlUIZE9BA0hcM+Iuz1dm/m04Y4kcHqWb73hd pM1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version; bh=BXJnqTFpS6EMwn3PyMm/N+jW9oyv7OcQALm/P0KaU7w=; b=lUy1cLxkusgfKB6UJomnwL3GXabb4mh0gCCbGeribbjy8Kb+Qy0c+zSwaBC/MwEhr2 3Yz/s0Bp3ppVhBJQz8fpBaLzwxfm+JBAYdlQY7uf+U8+S+OtF/VbQlMF6EfjVBcJaBUA q61DoOKBefAQYgpXyAwdIunhRdI/smubuHo/B1LhBGl3BisnDGVH0DmfWJ1P0ba3rOz/ W0jx18pa6itra6Zi4sZSVrEIWm35es9BN2KwYENIPlElXtE8Y/o/YCiu53sKKdkjSrjT oMdELS+92GH/4cdw15LnNIdsCx32AbwUYSCtM2v3cVQqyWfTTnBN/Vc9nji/3rmt2fWo VuHA== X-Gm-Message-State: AOAM532e6/1QsV3/+HMYSQ3Hof5Sl/T/RpvCJppK7QuGTu2DmLItRvGm RLXRnsgITGsdLfJnFrhSzWvkEB7SU0o= X-Google-Smtp-Source: ABdhPJziVeh7VpQYB1tpbpEQ1dpoxWLkpQPzzLoch8cW7Owd4eV9udwwZ5dHtdFgv+b5PFpHsJRcGA== X-Received: by 2002:ac8:5ed5:: with SMTP id s21mr5088172qtx.114.1610133298384; Fri, 08 Jan 2021 11:14:58 -0800 (PST) Received: from hurd (dsl-236-123-170.b2b2c.ca. [207.236.123.170]) by smtp.gmail.com with ESMTPSA id a9sm5225230qkk.39.2021.01.08.11.14.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Jan 2021 11:14:57 -0800 (PST) From: Maxim Cournoyer To: Mark H Weaver Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> <87r2l9q294.fsf@netris.org> <87tvq2smpm.fsf@gmail.com> <87d0wq3u8f.fsf@netris.org> Date: Fri, 08 Jan 2021 14:14:56 -0500 In-Reply-To: <87d0wq3u8f.fsf@netris.org> (Mark H. Weaver's message of "Sun, 17 Jun 2018 00:36:00 -0400") Message-ID: <87wnwn5jgv.fsf_-_@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30116-done Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= , 30116-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello Mark, I was recently reminded of this bug by a new encounter; at last wrote a test for your proposed fix, and it appear to work as intended! I've committed it on your behalf in commit 485ac28235 on the core-updates branch. Closing! Thank you for the clever hack :-) Maxim From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 08 16:44:00 2021 Received: (at 30116-done) by debbugs.gnu.org; 8 Jan 2021 21:44:00 +0000 Received: from localhost ([127.0.0.1]:50824 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kxzY8-00057W-BA for submit@debbugs.gnu.org; Fri, 08 Jan 2021 16:44:00 -0500 Received: from world.peace.net ([64.112.178.59]:57080) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kxzY6-00057I-I7 for 30116-done@debbugs.gnu.org; Fri, 08 Jan 2021 16:43:58 -0500 Received: from mhw by world.peace.net with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kxzY0-0006bw-Cz; Fri, 08 Jan 2021 16:43:52 -0500 From: Mark H Weaver To: Maxim Cournoyer Subject: Re: bug#30116: [PATCH] `substitute' crashes when file contains NUL characters (core-updates) In-Reply-To: <87wnwn5jgv.fsf_-_@gmail.com> References: <87r2qrc3mq.fsf@gmail.com> <87k1wjc35d.fsf_-_@gmail.com> <87r2l9q294.fsf@netris.org> <87tvq2smpm.fsf@gmail.com> <87d0wq3u8f.fsf@netris.org> <87wnwn5jgv.fsf_-_@gmail.com> Date: Fri, 08 Jan 2021 16:42:43 -0500 Message-ID: <871rev14wh.fsf@netris.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 30116-done Cc: 30116-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi, Maxim Cournoyer writes: > I was recently reminded of this bug by a new encounter; at last wrote a > test for your proposed fix, and it appear to work as intended! I've > committed it on your behalf in commit 485ac28235 on the core-updates > branch. Thanks for taking care of this Maxim, and for adding the test case. Regards, Mark From unknown Mon Aug 18 04:42:28 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 06 Feb 2021 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator