GNU bug report logs -
#25455
uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale
Previous Next
Reported by: Icenowy Zheng <icenowy <at> aosc.xyz>
Date: Sun, 15 Jan 2017 23:10:01 UTC
Severity: normal
Tags: notabug
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Problem:
When dealing lines with only a Chinese full-width punctuation or Japanese kana
and locale is zh_CN.UTF-8, uniq command will consider all the lines are the
same, and wrongly removed different punctuations.
Reproduce steps:
Run the following command:
```
printf "%s\n" , 。 : ¥ あ か ア カ a b c , . : $ | LC_ALL=zh_CN.UTF-8 uniq
```
Comments:
The printf command prints out
```
,
。
:
¥
あ
か
ア
カ
a
b
c
,
.
:
$
```
Every line is different.
However, after uniq command, it gives out
```
,
a
b
c
,
.
:
$
```
Under zh_TW.UTF-8 locale, the problems also happens; but under ja_JP.UTF-8 or C it do not happen.
Version info:
```
$ uniq --version
uniq (GNU coreutils) 8.26
... ...
$ /lib/libc.so.6
GNU C Library (2.24-2_AOSC_OS) stable release version 2.24, by Roland McGrath et al.
... ...
```
Architecture:
on x86_64 and armv7l architectures the test fails.
This bug report was last modified 6 years and 265 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.