GNU bug report logs -
#38627
uniq -c gets wrong count with non-ascii strings
Previous Next
Reported by: Roy Smith <roy <at> panix.com>
Date: Sun, 15 Dec 2019 19:41:01 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
Full log
Message #31 received at 38627 <at> debbugs.gnu.org (full text, mbox):
On Feb 23 2020, Pádraig Brady wrote:
> On 17/12/2019 17:25, Roy Smith wrote:
>> I stopped short of actually building uniq.c from source (bootstrap, prerequisites, ...), but looking at the code, it looks like the call chain is:
>>
>> different()
>> xmemcoll()
>> memcoll()
>> strcoll()
>>
>> so I tried a little test at the strcoll() level:
>>
>> #include <stdio.h>
>> #include <unistd.h>
>> #include <string.h>
>>
>> int
>> main (int argc, char **argv)
>> {
>> unsigned char null[] = {
>>
>> 0342, 0201, 0277, 0341, 0265, 0230, 0313, 0241, 0313, 0241, 0
>> };
>> unsigned char iraq[] = {
>> 0334, 0245, 0334, 0235, 0334, 0252, 0334, 0220, 0334, 0251, 0};
>>
>> printf("%s\n", null);
>> printf("%s\n", iraq);
>>
>> int m = strcoll(null, iraq);
>> printf("m = %d\n", m);
>> }
This lacks setlocale.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
This bug report was last modified 5 years and 90 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.