From unknown Sun Jun 22 04:21:18 2025 X-Loop: help-debbugs@gnu.org Subject: bug#29802: "uniq -c" doesn't like counting lines with nulls Resent-From: "PD" Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 21 Dec 2017 16:29:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 29802 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 29802@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.15138737185248 (code B ref -1); Thu, 21 Dec 2017 16:29:02 +0000 Received: (at submit) by debbugs.gnu.org; 21 Dec 2017 16:28:38 +0000 Received: from localhost ([127.0.0.1]:46331 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eS3i1-0001MZ-SO for submit@debbugs.gnu.org; Thu, 21 Dec 2017 11:28:38 -0500 Received: from eggs.gnu.org ([208.118.235.92]:46207) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eRxHC-00063l-7o for submit@debbugs.gnu.org; Thu, 21 Dec 2017 04:36:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eRxH5-0005IY-VM for submit@debbugs.gnu.org; Thu, 21 Dec 2017 04:36:24 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45019) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eRxH5-0005IN-Sp for submit@debbugs.gnu.org; Thu, 21 Dec 2017 04:36:23 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37146) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eRxH4-0002gD-GL for bug-coreutils@gnu.org; Thu, 21 Dec 2017 04:36:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eRxGy-00058c-JT for bug-coreutils@gnu.org; Thu, 21 Dec 2017 04:36:22 -0500 Received: from 172.103.241.96.cable.tpia.cipherkey.com ([172.103.241.96]:51259 helo=mail.pkts.ca) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eRxGy-00052A-AV for bug-coreutils@gnu.org; Thu, 21 Dec 2017 04:36:16 -0500 Received: from kirk.lan (localhost [127.0.0.1]) by mail.pkts.ca (8.14.5/8.14.5) with ESMTP id vBL8ecHu016000 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 21 Dec 2017 00:40:41 -0800 Received: (from apache@localhost) by kirk.lan (8.14.5/8.14.5/Submit) id vBL8eYZO015966; Thu, 21 Dec 2017 00:40:34 -0800 X-Authentication-Warning: kirk.lan: apache set sender to bug-bash.gnu.org@ch.pkts.ca using -f Received: from 54.240.196.168 (SquirrelMail authenticated user chowes) by www.pkts.ca with HTTP; Thu, 21 Dec 2017 00:40:34 -0800 Message-ID: <975a0b067e19c1a9ca6b95be51cc8a00.squirrel@www.pkts.ca> Date: Thu, 21 Dec 2017 00:40:34 -0800 From: "PD" User-Agent: SquirrelMail/1.4.22-7.fc17 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Mailman-Approved-At: Thu, 21 Dec 2017 11:28:36 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Uniq *sometimes* fails to combine lines containing a null character: # uniq --version uniq (GNU coreutils) 8.4 ##### Count duplicate text lines: # printf "\n\x00\n\x00\n" | cat -e | uniq -c 1 $ 2 ^@$ ##### Count duplicate binary lines: # printf "\x00\n\x00\n\n" | uniq -c | cat -e 2 ^@$ 1 $ ##### Whoops, fail to count duplicate binary lines: # printf "\n\x00\n\x00\n" | uniq -c | cat -e 1 $ 1 ^@$ 1 ^@$ This was the smallest test case; the original file had hundreds of lines with nulls (\x00) and Ctrl-A (\x01) characters, and it was quite a surprise when the output of 'sort testfile | uniq -c' had many pages of '1 ^@$' followed by '496 ^A$': it was counting the Ctrl-A lines correctly, but failing on the null-character lines. For automated testing with 'delta' or 'git bisect', this works: --- #!/bin/bash a=$(sort $1 | cat -e | uniq -c | md5sum -) b=$(sort $1 | uniq -c | cat -e | md5sum -) if [[ "$a" != "$b" ]]; then echo "PASS (bug present)"; exit 0 else echo "FAIL (bug absent)"; exit 1 fi ---- I regret not having the time to test this with coreutils 8.28, but I couldn't see anything in the git log to suggest this has been fixed: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=history;f=src/uniq.c;h=d1dac93c010d7333ced4b54fccbd965cbd5729c2;hb=HEAD Cheers, PD From unknown Sun Jun 22 04:21:18 2025 X-Loop: help-debbugs@gnu.org Subject: bug#29802: "uniq -c" doesn't like counting lines with nulls Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 21 Dec 2017 16:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 29802 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: PD , 29802@debbugs.gnu.org Received: via spool by 29802-submit@debbugs.gnu.org id=B29802.15138740405850 (code B ref 29802); Thu, 21 Dec 2017 16:34:02 +0000 Received: (at 29802) by debbugs.gnu.org; 21 Dec 2017 16:34:00 +0000 Received: from localhost ([127.0.0.1]:46342 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eS3nE-0001WI-0b for submit@debbugs.gnu.org; Thu, 21 Dec 2017 11:34:00 -0500 Received: from mail.magicbluesmoke.com ([82.195.144.49]:60890) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eS3nC-0001W8-1X for 29802@debbugs.gnu.org; Thu, 21 Dec 2017 11:33:58 -0500 Received: from localhost.localdomain (unknown [109.76.159.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id 6A7DFB0; Thu, 21 Dec 2017 16:33:54 +0000 (GMT) References: <975a0b067e19c1a9ca6b95be51cc8a00.squirrel@www.pkts.ca> From: =?UTF-8?Q?P=C3=A1draig?= Brady Message-ID: <8ab074dc-de83-55d2-73ab-22a679b5e9ee@draigBrady.com> Date: Thu, 21 Dec 2017 16:33:52 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <975a0b067e19c1a9ca6b95be51cc8a00.squirrel@www.pkts.ca> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 21/12/17 08:40, PD wrote: > ##### Whoops, fail to count duplicate binary lines: > # printf "\n\x00\n\x00\n" | uniq -c | cat -e > 1 $ > 1 ^@$ > 1 ^@$ Not reproducible on recent versions. Might this have been specific to the i18n patch? I.E. can you reproduce with LC_ALL=C set in the env? thanks, Pádraig From unknown Sun Jun 22 04:21:18 2025 X-Loop: help-debbugs@gnu.org Subject: bug#29802: "uniq -c" doesn't like counting lines with nulls Resent-From: Assaf Gordon Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 30 Oct 2018 02:21:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 29802 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 29802@debbugs.gnu.org Received: via spool by 29802-submit@debbugs.gnu.org id=B29802.154086604813885 (code B ref 29802); Tue, 30 Oct 2018 02:21:02 +0000 Received: (at 29802) by debbugs.gnu.org; 30 Oct 2018 02:20:48 +0000 Received: from localhost ([127.0.0.1]:52607 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gHJeC-0003bo-CA for submit@debbugs.gnu.org; Mon, 29 Oct 2018 22:20:48 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:38811) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gHJeA-0003bU-Ss; Mon, 29 Oct 2018 22:20:47 -0400 Received: by mail-pl1-f194.google.com with SMTP id p7-v6so4738696plk.5; Mon, 29 Oct 2018 19:20:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=9R8bnT0spWjIS9DemzbKYirtAlXqXttHGAy2MzdbQiw=; b=S5IX03JfCN8BoNCiUECJv00HzY2ByjBM+TqeVNtNEt68VFuAJPUN40SFALH2Uop82P JIOyKlCCIMEcdoGFOOkC1K6In4JXK9TcCyDI3i7JsXdtSVnBLepHpYNDWFOxaM6RopzH iYCaLlgvOKuUfFOpr7+ACK03jb04cOc6EKWY8CQ/03+5KB3YEM677M+YLarXMzHcBlYN hE1gLeSKMVg332JfkzicO7COMx6FwHu2yITzUBdohHDHEimglgaWHsjhmWbysx6xZZVv hlfUDi3BJ3gAQ2OT8LK2z3zmFe3GDl0R+fnxKSVI7YvAqiEVasiU3PpMjrAHCgwCTP75 QSsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=9R8bnT0spWjIS9DemzbKYirtAlXqXttHGAy2MzdbQiw=; b=sKmqZFJmPVgdAon2jcADwdxA2kHeY8IbtZ5Lu6+ocRG18S5jtOkM+6R7ExciYOX+kR dAWsW5BN6smfu8KPkpauMz3jG0bdYknzQE0BhQdy13nVwfg8+W8+QgfREJ4Vr3CAGBzQ iPzs1hPC3mnIXWIpeG+m2ZZPuePONJ/O8YxPrijtOQdkvK82cMJKyBRN81Dm3sxzulRK 9OeAZTFxTTZPMzRFKApjqyHBhnPFvjXR1ggG6dX1dkmSSaZ4o7Qn/IoBQ6P2RoTg0Xaz TjxuaIUDNkW6JodgMAfdrmgvPAthY4yXTkwQmQfocWMuj/DZ30KhcXA29mnz0PaIlrLy H7RA== X-Gm-Message-State: AGRZ1gK43DYcOMi+eSfni+OqkI22IoKl+RFHAAnNpJBX4sjJTAOpGoov eHaQGdAZL4JHLxPR99Y788TP43Yb3tc= X-Google-Smtp-Source: AJdET5ffIysQOqsWfo7leGKC/8LDSda53QD29KgZhX/AznwgXdVysqEJGdhp6X2Ws5mPdNNdD+S8dw== X-Received: by 2002:a17:902:7613:: with SMTP id k19-v6mr16633886pll.98.1540866040553; Mon, 29 Oct 2018 19:20:40 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id z5-v6sm10083384pfd.99.2018.10.29.19.20.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Oct 2018 19:20:39 -0700 (PDT) References: <975a0b067e19c1a9ca6b95be51cc8a00.squirrel@www.pkts.ca> <8ab074dc-de83-55d2-73ab-22a679b5e9ee@draigBrady.com> From: Assaf Gordon Message-ID: <3ac86035-a95c-0546-88b3-b5d44ef1ee5a@gmail.com> Date: Mon, 29 Oct 2018 20:20:38 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <8ab074dc-de83-55d2-73ab-22a679b5e9ee@draigBrady.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 29802 moreinfo close 29802 stop (triaging old bugs) On 2017-12-21 9:33 a.m., Pádraig Brady wrote: > On 21/12/17 08:40, PD wrote: >> # printf "\n\x00\n\x00\n" | uniq -c | cat -e >> 1 $ >> 1 ^@$ >> 1 ^@$ > > Not reproducible on recent versions. > Might this have been specific to the i18n patch? > I.E. can you reproduce with LC_ALL=C set in the env? > With no further comments in almost a year, I'm closing this bug. Discussion can continue by replying to this thread. -assaf