GNU bug report logs - #25455
uniq considers all the full-width punctuation and Japanese kana as the same under zh_CN.UTF-8 locale

Previous Next

Package: coreutils;

Reported by: Icenowy Zheng <icenowy <at> aosc.xyz>

Date: Sun, 15 Jan 2017 23:10:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 25455 <at> debbugs.gnu.org (full text, mbox):

From: Mike Frysinger <vapier <at> gentoo.org>
To: Icenowy Zheng <icenowy <at> aosc.xyz>
Cc: 25455 <at> debbugs.gnu.org, arthur2e5 <at> aosc.xyz
Subject: Re: bug#25455: uniq considers all the full-width punctuation and
 Japanese kana as the same under zh_CN.UTF-8 locale
Date: Fri, 20 Jan 2017 22:08:33 -0500
[Message part 1 (text/plain, inline)]
On 16 Jan 2017 04:01, Icenowy Zheng wrote:
> When dealing lines with only a Chinese full-width punctuation or Japanese kana
> and locale is zh_CN.UTF-8, uniq command will consider all the lines are the
> same, and wrongly removed different punctuations.

this is a problem with glibc, not coreutils.  you can follow the upstream bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=13063
-mike
[signature.asc (application/pgp-signature, inline)]

This bug report was last modified 6 years and 265 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.