GNU bug report logs - #36718
uniq treats distinct Korean characters equal

Reported by: Felix Hamme <fhamme <at> united-internet.de>

Date: Thu, 18 Jul 2019 14:49:01 UTC

Severity: normal

View this message in rfc822 format

From: Felix Hamme <fhamme <at> united-internet.de>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 36718 <at> debbugs.gnu.org
Cc: Gerhard Dittes <gerhard.dittes <at> ionos.com>
Subject: bug#36718: uniq treats distinct Korean characters equal
Date: Fri, 19 Jul 2019 12:18:32 +0200

Thanks @Paul Eggert, it seems like this isn't a bug at all.

My locale (de_DE.utf8) appears to lack definitions for the mentioned
Korean characters. After setting my system language to Korean
(ko_KR.utf8) uniq produces the expected output.
For my purpose, I'll set my environment to LC_COLLATE=C, which forces
byte-wise comparison and should work for all languages.

Admittedly, I could've searched it:
https://unix.stackexchange.com/questions/373848/why-does-uniq-think-%E3%81%82%E3%81%84-and-%E3%81%84%E3%81%82-are-the-same

This bug report was last modified 6 years and 23 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #36718 uniq treats distinct Korean characters equal

GNU bug report logs - #36718
uniq treats distinct Korean characters equal