GNU bug report logs - #25550
multibyte: uniq: special characters comparison

Previous Next

Package: coreutils;

Reported by: David Loyall <david.loyall <at> the-good-guys.net>

Date: Thu, 26 Jan 2017 23:14:02 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: David Loyall <david.loyall <at> the-good-guys.net>
To: 25550 <at> debbugs.gnu.org
Subject: bug#25550: Apparent unicode bug in uniq 8.26
Date: Thu, 26 Jan 2017 16:45:37 -0600
Hello.  I think I found a bug in uniq 8.26.

Here's a demo:

hobbes <at> metalbaby:~/e2-scratch$ cat faces_mre.txt
(◕‿◕)
(︺︹︺)

hobbes <at> metalbaby:~/e2-scratch$ uniq -c faces_mre.txt
2 (◕‿◕)

Here's some background info:

hobbes <at> metalbaby:~/e2-scratch$ od -x faces_mre.txt
0000000 e228 9597 80e2 e2bf 9597 0a29 ef28 bab8
0000020 b8ef efb9 bab8 0a29
0000030

hobbes <at> metalbaby:~/e2-scratch$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

hobbes <at> metalbaby:~/e2-scratch$ uniq --version
uniq (GNU coreutils) 8.26
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Richard M. Stallman and David MacKenzie.

The bug disappears in the C locale.

hobbes <at> metalbaby:~/e2-scratch$ LC_COLLATE=c uniq -c faces_mre.txt
1 (◕‿◕)
1 (︺︹︺)

I hope this helps.

Cheers,

--Dave Loyall
Omaha, Nebraska, USA




This bug report was last modified 6 years and 241 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.