From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 05 13:27:37 2023 Received: (at submit) by debbugs.gnu.org; 5 Feb 2023 18:27:37 +0000 Received: from localhost ([127.0.0.1]:46409 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pOjjk-0007y0-VK for submit@debbugs.gnu.org; Sun, 05 Feb 2023 13:27:37 -0500 Received: from lists.gnu.org ([209.51.188.17]:48658) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pOjjj-0007xt-LE for submit@debbugs.gnu.org; Sun, 05 Feb 2023 13:27:36 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pOjjj-0007Tz-Bx for bug-coreutils@gnu.org; Sun, 05 Feb 2023 13:27:35 -0500 Received: from relay3-d.mail.gandi.net ([2001:4b98:dc4:8::223]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pOjjh-0006wH-Hv for bug-coreutils@gnu.org; Sun, 05 Feb 2023 13:27:35 -0500 Received: (Authenticated sender: stephane@chazelas.org) by mail.gandi.net (Postfix) with ESMTPSA id 30EF660002 for ; Sun, 5 Feb 2023 18:27:28 +0000 (UTC) Date: Sun, 5 Feb 2023 18:27:28 +0000 From: Stephane Chazelas To: bug-coreutils@gnu.org Subject: wc -c doesn't advance stdin position when it's a regular file Message-ID: <20230205182728.5i2oi23purlzp6jj@chazelas.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: pass client-ip=2001:4b98:dc4:8::223; envelope-from=stephane@chazelas.org; helo=relay3-d.mail.gandi.net X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) "wc -c" without filename arguments is meant to read stdin til EOF and report the number of bytes it has read. When stdin is on a regular file, GNU wc has that optimisation whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR) to find out its current position within the file, fstat(0) and reports st_size - pos (assuming st_size > pos). However, it does not move the position to the end of the file. That means for instance that: $ echo test > file $ { wc -c; wc -c; } < file 5 5 Instead of 5, then 0: $ { wc -c; cat; } < file 5 test So the optimisation is incomplete. It also reports the size of the file even if it could not possibly read it because it's not open in read mode: { wc -c; } 0>> file 5 IMO, it should only do the optimisation if - fcntl(F_GETFL) to check that the file is opened in O_RDONLY or O_RDWR - current checks for /proc /sys-like filesystems - pos > st_size - lseek(0,st_size,SEEK_POS) is successful. (that leaves a race window above where it could move the cursor backward, but I would think that can be ignored as if something else reads at the same time, there's not much we can expect anyway). -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 05 15:00:09 2023 Received: (at 61300) by debbugs.gnu.org; 5 Feb 2023 20:00:09 +0000 Received: from localhost ([127.0.0.1]:46522 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pOlBI-0006iL-Vs for submit@debbugs.gnu.org; Sun, 05 Feb 2023 15:00:09 -0500 Received: from mail-wm1-f41.google.com ([209.85.128.41]:54805) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pOlBG-0006fn-Ii for 61300@debbugs.gnu.org; Sun, 05 Feb 2023 15:00:07 -0500 Received: by mail-wm1-f41.google.com with SMTP id n13so7318992wmr.4 for <61300@debbugs.gnu.org>; Sun, 05 Feb 2023 12:00:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:from:references:to:content-language:subject:user-agent :mime-version:date:message-id:sender:from:to:cc:subject:date :message-id:reply-to; bh=WevR5bQtQfWPaUAyjLNua51SBnrJgbV41G2fxBBoo+0=; b=hllWXjtePPy4CVQAbVvfTJi9VcjG+R34HLdM34ILSisWExtKFGf6ox5+utz+tTNyMn j6ZXPMvCa+YUlDLBhMtNpdN7TAHWWkO+vfWPsG6Ha+KFpP1P+8eQYDhuT15diA++7YzL HP9fKQXnmOA+1k4xcjZ6w7OstHSE/BxnEI3HVUaCS5zQgHcia6X4eTvVgXBnctP/Hw3L 6QgbHS/6izkeJNyCIRpE1qrTJvKwdyGc1BPLYKSVOSwIB8JAgTBAj00tGJgdL52OxcCn kz31rSCiptVgQoKsSvhalYnXeM0J0MhWd0ExnQc/6TKYFuO8Yu8d3jwOaN0DrCYMXQ3J Yn5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:from:references:to:content-language:subject:user-agent :mime-version:date:message-id:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WevR5bQtQfWPaUAyjLNua51SBnrJgbV41G2fxBBoo+0=; b=QokmuyUYK7nZxo2ROwjYG9InN+1YAT8+XfA0tBAn1GnJ6c7/StfwJIq57BC2Sa+XFZ QJrJjsmfiEpD3X/ngFAKmIONZz7IUwsk8WvIlyfE4unOXsP6XbORU++mPPC1CbKvrQI1 GWZC02hKgRgJJgYcgxDd8YGeGffV/XSY2xVMqXYFMaOme1LL1SzpfBhUr73cHIcEsVEu mLm1EEUNe8OfQg247nrKWZdNR8lVecnJkW6Usj2vjaWGrNCkmJORzCsIgXwGkfFiVzgI WxKPFSexpMHQwKmjQRRg6fLN8R69Q/RvSqHQqBEgj179JmcIMXIlWG+MtrrsVUia0Ss6 qjag== X-Gm-Message-State: AO0yUKUEcSLn1LzZ0CvvjkMc75n5nWg3PbKGIKk98kk80mZ6HiSrYWBO MZ6UAxM6I4VapCOYKZFJRAE= X-Google-Smtp-Source: AK7set/bJUWhplUI3HRAHgWiXgA/K/YzZdmmr8SzX97J9RXkdlj9/noMknk5lX7n+6KpQhBBTcv92w== X-Received: by 2002:a05:600c:3545:b0:3df:ea9a:21c7 with SMTP id i5-20020a05600c354500b003dfea9a21c7mr8451392wmq.33.1675627200340; Sun, 05 Feb 2023 12:00:00 -0800 (PST) Received: from [192.168.1.9] (95-44-90-175-dynamic.agg2.lod.rsl-rtd.eircom.net. [95.44.90.175]) by smtp.googlemail.com with ESMTPSA id p11-20020a1c740b000000b003dfe659f9b1sm8809559wmc.3.2023.02.05.11.59.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 05 Feb 2023 11:59:59 -0800 (PST) Content-Type: multipart/mixed; boundary="------------f2Mfkzgd92hONZ5EOJQJ38ru" Message-ID: <3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com> Date: Sun, 5 Feb 2023 19:59:58 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Thunderbird/109.0 Subject: Re: bug#61300: wc -c doesn't advance stdin position when it's a regular file Content-Language: en-US To: Stephane Chazelas , 61300@debbugs.gnu.org References: <20230205182728.5i2oi23purlzp6jj@chazelas.org> From: =?UTF-8?Q?P=C3=A1draig_Brady?= In-Reply-To: <20230205182728.5i2oi23purlzp6jj@chazelas.org> X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 61300 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------f2Mfkzgd92hONZ5EOJQJ38ru Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 05/02/2023 18:27, Stephane Chazelas wrote: > "wc -c" without filename arguments is meant to read stdin til > EOF and report the number of bytes it has read. > > When stdin is on a regular file, GNU wc has that optimisation > whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR) > to find out its current position within the file, fstat(0) and > reports st_size - pos (assuming st_size > pos). > > However, it does not move the position to the end of the file. > That means for instance that: > > $ echo test > file > $ { wc -c; wc -c; } < file > 5 > 5 > > Instead of 5, then 0: > > $ { wc -c; cat; } < file > 5 > test > > So the optimisation is incomplete. > > It also reports the size of the file even if it could not possibly read it > because it's not open in read mode: > > { wc -c; } 0>> file > 5 > > IMO, it should only do the optimisation if > - fcntl(F_GETFL) to check that the file is opened in O_RDONLY or O_RDWR > - current checks for /proc /sys-like filesystems > - pos > st_size > - lseek(0,st_size,SEEK_POS) is successful. > > (that leaves a race window above where it could move the cursor > backward, but I would think that can be ignored as if something > else reads at the same time, there's not much we can expect > anyway). Yes I agree. Adjusting would also avoid the following inconsistencies: $ { wc -c; wc -c; } < file 5 5 $ { wc -l; wc -l; } < file 1 0 $ truncate -s $(getconf PAGESIZE) file $ { wc -c; wc -c; } < file 4096 0 Hopefully the attached addresses this. Note it doesn't add the constraint on the input being readable, which I'll think a bit more about. cheers, Pádraig --------------f2Mfkzgd92hONZ5EOJQJ38ru Content-Type: text/x-patch; charset=UTF-8; name="wc-update-offset.patch" Content-Disposition: attachment; filename="wc-update-offset.patch" Content-Transfer-Encoding: base64 RnJvbSA0MmY3MmVjNDI0ZTdlZWNkNmI1NmM1YjZmY2E1ZjM3N2ZmNzM3OTViIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiA9P1VURi04P3E/UD1DMz1BMWRyYWlnPTIwQnJhZHk/ PSA8UEBkcmFpZ0JyYWR5LmNvbT4KRGF0ZTogU3VuLCA1IEZlYiAyMDIzIDE5OjUyOjMxICsw MDAwClN1YmplY3Q6IFtQQVRDSF0gd2M6IGVuc3VyZSB3ZSB1cGRhdGUgZmlsZSBvZmZzZXQK Ciogc3JjL3djLmMgKHdjKTogVXBkYXRlIHRoZSBvZmZzZXQgd2hlbiBub3QgcmVhZGluZywK YW5kIGRvIHJlYWQgaWYgd2UgY2FuJ3QgdXBkYXRlIHRoZSBvZmZzZXQuCiogdGVzdHMvbWlz Yy93Yy1wcm9jLnNoOiBBZGQgYSB0ZXN0IGNhc2UuCiogTkVXUzogTWVudGlvbiB0aGUgYnVn IGZpeC4KRml4ZXMgaHR0cHM6Ly9idWdzLmdudS5vcmcvNjEzMDAKLS0tCiBORVdTICAgICAg ICAgICAgICAgICAgfCAgNCArKysrCiBzcmMvd2MuYyAgICAgICAgICAgICAgfCAgNSArKysr LQogdGVzdHMvbWlzYy93Yy1wcm9jLnNoIHwgMTIgKysrKysrKysrKysrCiAzIGZpbGVzIGNo YW5nZWQsIDIwIGluc2VydGlvbnMoKyksIDEgZGVsZXRpb24oLSkKCmRpZmYgLS1naXQgYS9O RVdTIGIvTkVXUwppbmRleCBiM2NkZTRhMDEuLjFjZWE4Y2MzMiAxMDA2NDQKLS0tIGEvTkVX UworKysgYi9ORVdTCkBAIC01Nyw2ICs1NywxMCBAQCBHTlUgY29yZXV0aWxzIE5FV1MgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAtKi0gb3V0bGluZSAtKi0KICAgc2l6 ZWQgZmlsZXMgbGFyZ2VyIHRoYW4gU0laRV9NQVguCiAgIFtidWcgaW50cm9kdWNlZCBpbiBj b3JldXRpbHMtOC4yNF0KIAorICBgd2MgLWNgIHdpbGwgYWdhaW4gY29ycmVjdGx5IHVwZGF0 ZSB0aGUgcmVhZCBvZmZzZXQgb2YgaW5wdXRzLgorICBQcmV2aW91c2x5IGl0IGRlZHVjZWQg dGhlIHNpemUgb2YgaW5wdXRzIHdoaWxlIGxlYXZpbmcgdGhlIG9mZnNldCB1bmNoYW5nZWQu CisgIFtidWcgaW50cm9kdWNlZCBpbiBjb3JldXRpbHMtOC4yN10KKwogKiogQ2hhbmdlcyBp biBiZWhhdmlvcgogCiAgIFByb2dyYW1zIG5vdyBzdXBwb3J0IHRoZSBuZXcgUm9ubmEgKFIp LCBhbmQgUXVldHRhIChRKSBTSSBwcmVmaXhlcywKZGlmZiAtLWdpdCBhL3NyYy93Yy5jIGIv c3JjL3djLmMKaW5kZXggNWYzZWY2ZWVlLi5kZTA0NjEyZTkgMTAwNjQ0Ci0tLSBhL3NyYy93 Yy5jCisrKyBiL3NyYy93Yy5jCkBAIC00NDYsNyArNDQ2LDEwIEBAIHdjIChpbnQgZmQsIGNo YXIgY29uc3QgKmZpbGVfeCwgc3RydWN0IGZzdGF0dXMgKmZzdGF0dXMsIG9mZl90IGN1cnJl bnRfcG9zKQogICAgICAgICAgICAgICAgICBiZXlvbmQgdGhlIGVuZCBvZiB0aGUgZmlsZS4g IEFzIGluIHRoZSBleGFtcGxlIGFib3ZlLiAgKi8KIAogICAgICAgICAgICAgICBieXRlcyA9 IGVuZF9wb3MgPCBjdXJyZW50X3BvcyA/IDAgOiBlbmRfcG9zIC0gY3VycmVudF9wb3M7Ci0g ICAgICAgICAgICAgIHNraXBfcmVhZCA9IHRydWU7CisgICAgICAgICAgICAgIGlmIChieXRl cyAmJiAwIDw9IGxzZWVrIChmZCwgYnl0ZXMsIFNFRUtfQ1VSKSkKKyAgICAgICAgICAgICAg ICBza2lwX3JlYWQgPSB0cnVlOworICAgICAgICAgICAgICBlbHNlCisgICAgICAgICAgICAg ICAgYnl0ZXMgPSAwOwogICAgICAgICAgICAgfQogICAgICAgICAgIGVsc2UKICAgICAgICAg ICAgIHsKZGlmZiAtLWdpdCBhL3Rlc3RzL21pc2Mvd2MtcHJvYy5zaCBiL3Rlc3RzL21pc2Mv d2MtcHJvYy5zaAppbmRleCA1ZWI0M2I5ODIuLjJiNTAyNjQwNSAxMDA3NTUKLS0tIGEvdGVz dHMvbWlzYy93Yy1wcm9jLnNoCisrKyBiL3Rlc3RzL21pc2Mvd2MtcHJvYy5zaApAQCAtNDIs NiArNDIsMTggQEAgY2F0IDw8XEVPRiA+IGV4cAogRU9GCiBjb21wYXJlIGV4cCBvdXQgfHwg ZmFpbD0xCiAKKyMgRW5zdXJlIHdlIHVwZGF0ZSB0aGUgb2Zmc2V0IGV2ZW4gd2hlbiBub3Qg cmVhZGluZywKKyMgd2hpY2ggd2Fzbid0IHRoZSBjYXNlIGZyb20gY29yZXV0aWxzLTguMjcg dG8gY29yZXV0aWxzLTkuMQoreyB3YyAtYzsgd2MgLWM7IH0gPCBub19yZWFkID4gIG91dCB8 fCBmYWlsPTEKK3sgd2MgLWM7IHdjIC1jOyB9IDwgZG9fcmVhZCA+PiBvdXQgfHwgZmFpbD0x CitjYXQgPDxcRU9GID4gZXhwCisyCiswCisxMDQ4NTc2CiswCitFT0YKK2NvbXBhcmUgZXhw IG91dCB8fCBmYWlsPTEKKwogIyBFbnN1cmUgd2UgZG9uJ3QgcmVhZCB0b28gbXVjaCB3aGVu IHJlYWRpbmcsCiAjIGFzIHdhcyB0aGUgY2FzZSBvbiAzMiBiaXQgc3lzdGVtcwogIyBmcm9t IGNvcmV1dGlscy04LjI0IHRvIGNvcmV1dGlscy05LjEKLS0gCjIuMjYuMgoK --------------f2Mfkzgd92hONZ5EOJQJ38ru-- From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 05 15:59:59 2023 Received: (at 61300) by debbugs.gnu.org; 5 Feb 2023 20:59:59 +0000 Received: from localhost ([127.0.0.1]:46576 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pOm7C-0008BQ-UB for submit@debbugs.gnu.org; Sun, 05 Feb 2023 15:59:59 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:58164) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pOm7A-0008BB-JM for 61300@debbugs.gnu.org; Sun, 05 Feb 2023 15:59:57 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A7A9416005E; Sun, 5 Feb 2023 12:59:50 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id JLdE2-I2J-yF; Sun, 5 Feb 2023 12:59:50 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id F1BFA16007E; Sun, 5 Feb 2023 12:59:49 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra.cs.ucla.edu F1BFA16007E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=78364E5A-2AF3-11ED-87FA-8298ECA2D365; t=1675630790; bh=3Ar6P5T0vh7QgmuBIMyRfCBOxibR2QVlNYAVkM1/aUE=; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type: Content-Transfer-Encoding; b=RpTp9fJT+MHumxX4G51O/spFCb7L/9NuObvK86CgDsRZ93U2M8ppQa8nUHlrC5FiM Zo4rahXhX6lD6lj9x+Co8AIQBSJraEW34ABvqbrw+zB4fKZVXeMrPt/J4A/mcYi4j7 hSr0TFG0a554lQLCcyWIBOylGp6S80EgItLEReFA= X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id wEeeybdEBEBz; Sun, 5 Feb 2023 12:59:49 -0800 (PST) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id CA39016005E; Sun, 5 Feb 2023 12:59:49 -0800 (PST) Message-ID: Date: Sun, 5 Feb 2023 12:59:49 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: bug#61300: wc -c doesn't advance stdin position when it's a regular file Content-Language: en-US To: =?UTF-8?Q?P=c3=a1draig_Brady?= , Stephane Chazelas , 61300@debbugs.gnu.org References: <20230205182728.5i2oi23purlzp6jj@chazelas.org> <3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com> From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: <3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -3.4 (---) X-Debbugs-Envelope-To: 61300 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) On 2023-02-05 11:59, P=C3=A1draig Brady wrote: > Hopefully the attached addresses this.=20 Thanks for fixing that. > Note it doesn't add the constraint on the input being readable, > which I'll think a bit more about. Let's leave that as-is, please. If 'wc' can output the correct value=20 without reading its input, POSIX does not require 'wc' to do the read,=20 and it seems perverse to modify 'wc' to go to the effort to refuse to=20 tell the user useful information that the user requested and that 'wc'=20 knows. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 06 01:27:14 2023 Received: (at 61300) by debbugs.gnu.org; 6 Feb 2023 06:27:14 +0000 Received: from localhost ([127.0.0.1]:47257 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pOuy9-0004lH-Uw for submit@debbugs.gnu.org; Mon, 06 Feb 2023 01:27:14 -0500 Received: from relay7-d.mail.gandi.net ([217.70.183.200]:51027) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pOuy7-0004l0-HM for 61300@debbugs.gnu.org; Mon, 06 Feb 2023 01:27:11 -0500 Received: (Authenticated sender: stephane@chazelas.org) by mail.gandi.net (Postfix) with ESMTPA id 5531120007; Mon, 6 Feb 2023 06:27:02 +0000 (UTC) MIME-Version: 1.0 Date: Mon, 06 Feb 2023 06:27:02 +0000 From: Stephane Chazelas To: Paul Eggert Subject: Re: bug#61300: wc -c doesn't advance stdin position when it's a regular file In-Reply-To: References: <20230205182728.5i2oi23purlzp6jj@chazelas.org> <3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com> Message-ID: X-Sender: stephane@chazelas.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 61300 Cc: =?UTF-8?Q?P=C3=A1draig_Brady?= , 61300@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) On 2023-02-05 20:59, Paul Eggert wrote: > On 2023-02-05 11:59, Pádraig Brady wrote: [...] > Let's leave that as-is, please. If 'wc' can output the correct value > without reading its input, POSIX does not require 'wc' to do the read, > and it seems perverse to modify 'wc' to go to the effort to refuse to > tell the user useful information that the user requested and that 'wc' > knows. [...] But while I would agree it's very unlikely to ever be hit in practice, as I can't think of any reason why one would call wc with its input not input for reading, wc is meant to report how many bytes it has read, not the size of its input (though POSIX seems ambiguous on that). See also (with Pádraig's patch applied): $ { echo test > file; wc -c; echo test2 >&0; cat file; } 0> file 5 test test2 wc has lseek()ed to the end of the file even though it was opened in write-only mode. Compare with: $ { echo test > file; wc -lc; echo test2 >&0; cat file; } 0> file wc: 'standard input': Bad file descriptor 0 0 test2 -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 06 14:38:34 2023 Received: (at 61300) by debbugs.gnu.org; 6 Feb 2023 19:38:34 +0000 Received: from localhost ([127.0.0.1]:50006 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pP7Jy-0002Kk-Dx for submit@debbugs.gnu.org; Mon, 06 Feb 2023 14:38:34 -0500 Received: from mail-wr1-f51.google.com ([209.85.221.51]:38842) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pP7Jw-0002KS-LC for 61300@debbugs.gnu.org; Mon, 06 Feb 2023 14:38:33 -0500 Received: by mail-wr1-f51.google.com with SMTP id ba1so7388658wrb.5 for <61300@debbugs.gnu.org>; Mon, 06 Feb 2023 11:38:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=AmqwrhagMHjY+iIGlfC8zNnv59gE7Q8NgD6G2YdHcbo=; b=gIhFolM1mZfSzqL9Q73Zjpf6Xjz2vWELJNJngESflJm9GYjfVMupW4VqE2NKExQGLI ZLjUkKvDG2t8l4Vno1t8kP8sqWEsMeGLyetQ5HxQ0sjklf+UVIlljhDWiGYZ5F30R45x BcX+fwoCzKVfRZL8FU6MuL9iM35mY2YXZSdxR6of/fnWn5ly338zJSEHbsGGtB3/TE5f oQIRD+t8ZsnbhU7GzEfSZPud7KPYEBZ/GDKHIRo29RSxgRhmJI83gbbRoTv4FfC6PBaU q8MBQDPRrwP7ev9mi8t/G7ka0nqEr9F5kbGBld3Dppm+RobB22OczsTG1X9yHDBWFhf+ +23Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AmqwrhagMHjY+iIGlfC8zNnv59gE7Q8NgD6G2YdHcbo=; b=N/1tKomS+w5sm+g5oz+f1s1natzpbf8hNuGvgEmd0H6J8p9Q96+KZYD6v2E/XZQZbB V01F39SJ9lGH3tJ0klc+Kzv18rpos62RNo3KWQQg1b2LecCwfRmJJaDOYhkcgDkf9x7A HIGdYeTNdxLkFjvacltwo3P6ZaYKIfWefG5G8Ep9TYJnoomGRgLdF9MPD8XxOaVlkiKW LbsAMTa642Iq3taPbEbclQKlrGe4oPGuW2hdsqiLoJyDZ6m19D+hmGv66NZcdF7zeFHk Lj1cxdxveJ4EkADnID3CM2PwpQ5jxWkdd4JCCql4P2HnpMJfZKS8bl7fy30tGNTdV5VM Q1dw== X-Gm-Message-State: AO0yUKXk60mYS9Q6L+RIEwiT1k7nV1RklC7pgJTUmvmWBpyIaNSytqEf 2eQveud89NL+38jIF6D8n4E= X-Google-Smtp-Source: AK7set+ZUH6ILzesBIt6MWFPBJsFXr+348QNdSYrnhu68WXKdNaZ6Mlm8k5FR76flJ7UmLhPkpp4/g== X-Received: by 2002:adf:f54a:0:b0:2bd:9b1e:165a with SMTP id j10-20020adff54a000000b002bd9b1e165amr67609wrp.6.1675712306049; Mon, 06 Feb 2023 11:38:26 -0800 (PST) Received: from [192.168.1.9] (95-44-90-175-dynamic.agg2.lod.rsl-rtd.eircom.net. [95.44.90.175]) by smtp.googlemail.com with ESMTPSA id h3-20020a056000000300b002c3e5652744sm3895787wrx.46.2023.02.06.11.38.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Feb 2023 11:38:25 -0800 (PST) Message-ID: Date: Mon, 6 Feb 2023 19:38:24 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Thunderbird/109.0 Subject: Re: bug#61300: wc -c doesn't advance stdin position when it's a regular file Content-Language: en-US To: Stephane Chazelas , Paul Eggert References: <20230205182728.5i2oi23purlzp6jj@chazelas.org> <3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com> From: =?UTF-8?Q?P=C3=A1draig_Brady?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.9 (/) X-Debbugs-Envelope-To: 61300 Cc: 61300@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.9 (-) On 06/02/2023 06:27, Stephane Chazelas wrote: > On 2023-02-05 20:59, Paul Eggert wrote: >> On 2023-02-05 11:59, Pádraig Brady wrote: > [...] >> Let's leave that as-is, please. If 'wc' can output the correct value >> without reading its input, POSIX does not require 'wc' to do the read, >> and it seems perverse to modify 'wc' to go to the effort to refuse to >> tell the user useful information that the user requested and that 'wc' >> knows. > [...] > > But while I would agree it's very unlikely to ever be hit in practice, > as I can't think of any reason why one would call wc with its input not > input for reading, wc is meant to report how many bytes it has read, not > the size of its input (though POSIX seems ambiguous on that). > > See also (with Pádraig's patch applied): > > $ { echo test > file; wc -c; echo test2 >&0; cat file; } 0> file > 5 > test > test2 > > wc has lseek()ed to the end of the file even though it was opened in > write-only mode. Compare with: > > $ { echo test > file; wc -lc; echo test2 >&0; cat file; } 0> file > wc: 'standard input': Bad file descriptor > 0 0 > test2 Some more thoughts on this. Note the orig thread with motivation for the st_size optimization is at: https://lists.gnu.org/archive/html/coreutils/2016-03/msg00020.html Note also wc -c has had an st_size optimization for all sizes since the very first coreutils implementation. A similar edge case to Stehpane's above is also seen when doing the lseek(near_end)+read() method, as shown by: ${ truncate -s 32768 file; wc -c; wc -c; } 0> file wc: 'standard input': Bad file descriptor 28679 wc: 'standard input': Bad file descriptor 0 One possible solution is avoid the above issue is: start_pos=lseek(0,SEEK_CUR); bytes += lseek(near_end) while (read()) { if (did_lseek && read error == EBADF|EINVAL) lseek(start_pos); did_lseek=false; bytes=0; continue; } That would also fix an issue I saw for one file in /sys, where: /sys/devices/pci0000:00/0000:00:02.0/rom st_size = 131072, available bytes = 0, wc -c = 127007 (EINVAL) Doing that method for all file sizes rather than just using st_size, would work but also penalize perf for the common case. Consider cached stats on a network file system for example. So I guess in addition to be able to keep the st_size optimization with stdin, consistent with other cases we could verify/restrict to readable also. Note this is only an issue for stdin. Files specified on the command line and explicitly opened, should get a permission error at that stage. Note also if you really want to read, you can always `cat | wc -c` rather than just `wc -c`, so I'm still not sure we should add the readable restriction for stdin, but I'm not very against it at least since it is such an edge case. cheers, Pádraig From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 06 14:50:51 2023 Received: (at 61300) by debbugs.gnu.org; 6 Feb 2023 19:50:51 +0000 Received: from localhost ([127.0.0.1]:50010 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pP7Vq-0002i2-Qz for submit@debbugs.gnu.org; Mon, 06 Feb 2023 14:50:51 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:47770) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pP7Vp-0002hk-5d for 61300@debbugs.gnu.org; Mon, 06 Feb 2023 14:50:50 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B380B16009C; Mon, 6 Feb 2023 11:50:42 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 6GpZ5QRhCDzE; Mon, 6 Feb 2023 11:50:42 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0C6741600A4; Mon, 6 Feb 2023 11:50:42 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra.cs.ucla.edu 0C6741600A4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=78364E5A-2AF3-11ED-87FA-8298ECA2D365; t=1675713042; bh=4igPzY3JW5CiyVzPcNg0tnxGGhXxhq8bvivfBuHY8VI=; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type: Content-Transfer-Encoding; b=WQDK/0kOUWdAD+hLO0VTsR2cOY/9z3xXzZbllZYGRjwItPEBIfRTSRWsqB9FAm+K2 xHZOrfqFD9yVVFlyHX8G0lXIcAFMR/icTZfVXF+BJubI8Z2hMgpUdW/j3FI/QW7B7j NSZISfUU0aPTReK6Vo/ATHAyUjJk7O0fhlN+1D0Y= X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id izE8H8pZnnEN; Mon, 6 Feb 2023 11:50:41 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E2B6316009C; Mon, 6 Feb 2023 11:50:41 -0800 (PST) Message-ID: Date: Mon, 6 Feb 2023 11:50:37 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: bug#61300: wc -c doesn't advance stdin position when it's a regular file To: =?UTF-8?Q?P=c3=a1draig_Brady?= , Stephane Chazelas References: <20230205182728.5i2oi23purlzp6jj@chazelas.org> <3fddd0c6-7d8f-631e-f37a-f2635ba0268e@draigBrady.com> Content-Language: en-US From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -3.4 (---) X-Debbugs-Envelope-To: 61300 Cc: 61300@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) On 2/6/23 11:38, P=C3=A1draig Brady wrote: > Note also if you really want to read, you can always `cat | wc -c` > rather than just `wc -c` Even that's not guaranteed, as 'cat' is not required to use the 'read'=20 system call if it can determine that the standard input contains only=20 NULs without calling 'read'. (GNU 'cat' doesn't do this, but POSIX=20 allows it.) We shouldn't complicate 'wc' (thus slowing it down and worse, possibly=20 introducing a bug) if the only goal is to make 'wc' fail more often in=20 implausible scenarios.