GNU bug report logs - #25455
uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale

Previous Next

Package: coreutils;

Reported by: Icenowy Zheng <icenowy <at> aosc.xyz>

Date: Sun, 15 Jan 2017 23:10:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: "Mingye Wang (Arthur2e5)" <arthur2e5 <at> aosc.xyz>
To: icenowy <at> aosc.xyz, 25455 <at> debbugs.gnu.org
Subject: bug#25455: uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale
Date: Tue, 17 Jan 2017 18:22:48 +0000
15.01.2017, 20:01, "Icenowy Zheng" <icenowy <at> aosc.xyz>:
> Problem:
> When dealing lines with only a Chinese full-width punctuation or Japanese kana
> and locale is zh_CN.UTF-8, uniq command will consider all the lines are the
> same, and wrongly removed different punctuations.

To narrow the scope down a bit, I should mention that LC_COLLATE is enough to trigger the bug:

printf '%s\n' 。 , ? ! a b c | LC_COLLATE=zh_CN.UTF-8 uniq

-- 
Regards,

Arthur2e5




This bug report was last modified 6 years and 265 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.