GNU bug report logs - #38627
uniq -c gets wrong count with non-ascii strings

Previous Next

Package: coreutils;

Reported by: Roy Smith <roy <at> panix.com>

Date: Sun, 15 Dec 2019 19:41:01 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #31 received at 38627 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: 38627 <at> debbugs.gnu.org
Cc: roy <at> panix.com, P <at> draigBrady.com
Subject: Re: bug#38627: uniq -c gets wrong count with non-ascii strings
Date: Sun, 23 Feb 2020 21:02:30 +0100
On Feb 23 2020, Pádraig Brady wrote:

> On 17/12/2019 17:25, Roy Smith wrote:
>> I stopped short of actually building uniq.c from source (bootstrap, prerequisites, ...), but looking at the code, it looks like the call chain is:
>>
>> different()
>> xmemcoll()
>> memcoll()
>> strcoll()
>>
>> so I tried a little test at the strcoll() level:
>>
>> #include <stdio.h>
>> #include <unistd.h>
>> #include <string.h>
>>
>> int
>> main (int argc, char **argv)
>> {
>>    unsigned char null[] = {
>>
>>      0342, 0201, 0277, 0341, 0265, 0230, 0313, 0241, 0313, 0241, 0
>>    };
>>    unsigned char iraq[] = {
>>      0334, 0245, 0334, 0235, 0334, 0252, 0334, 0220, 0334, 0251, 0};
>>
>>    printf("%s\n", null);
>>    printf("%s\n", iraq);
>>
>>    int m = strcoll(null, iraq);
>>    printf("m = %d\n", m);
>> }

This lacks setlocale.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




This bug report was last modified 5 years and 90 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.