GNU bug report logs -
#16168
uniq mis-handles UTF8 (8bit) characters
Previous Next
Reported by: Shlomo Urbach <urbach <at> google.com>
Date: Mon, 16 Dec 2013 16:56:03 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16168 in the body.
You can then email your comments to 16168 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#16168
; Package
coreutils
.
(Mon, 16 Dec 2013 16:56:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Shlomo Urbach <urbach <at> google.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 16 Dec 2013 16:56:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Lines with CJK letters are deemed equal by length only, since the
characters seem to be ignored.
I understand this is due to locale.
But, it would be nice if a simple flag would do a locale-free comparison
(i.e. equal = all bytes are equal).
[Message part 2 (text/html, inline)]
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Mon, 16 Dec 2013 17:34:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Shlomo Urbach <urbach <at> google.com>
:
bug acknowledged by developer.
(Mon, 16 Dec 2013 17:34:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 16168-done <at> debbugs.gnu.org (full text, mbox):
tag 16168 notabug
close 16168
stop
On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
> Lines with CJK letters are deemed equal by length only, since the
> characters seem to be ignored.
> I understand this is due to locale.
> But, it would be nice if a simple flag would do a locale-free comparison
> (i.e. equal = all bytes are equal).
If you want to compare byte by byte:
LC_ALL=C uniq ....
thanks,
Pǽdraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#16168
; Package
coreutils
.
(Mon, 16 Dec 2013 18:03:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 16168 <at> debbugs.gnu.org (full text, mbox):
Maybe he was hoping for a uniq [-b|--bytes] ?
Suggestion to Shlomo (if you use bash):
alias uniq='LC_ALL=C \uniq'
or, if you want it in your shell scripts too:
uniq() { LC_ALL=C; "${type -P uniq}" "$@" ; }; export -f uniq
On 12/16/2013 9:33 AM, Pádraig Brady wrote:
> tag 16168 notabug
> close 16168
> stop
>
> On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
>> Lines with CJK letters are deemed equal by length only, since the
>> characters seem to be ignored.
>> I understand this is due to locale.
>> But, it would be nice if a simple flag would do a locale-free comparison
>> (i.e. equal = all bytes are equal).
>
> If you want to compare byte by byte:
>
> LC_ALL=C uniq ....
>
> thanks,
> Pǽdraig.
>
>
>
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#16168
; Package
coreutils
.
(Mon, 16 Dec 2013 20:20:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 16168 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks,
this works great.
But, I'm sure the general public doesn't know of this issue.
Shlomo
On Mon, Dec 16, 2013 at 8:02 PM, Linda Walsh <coreutils <at> tlinx.org> wrote:
> Maybe he was hoping for a uniq [-b|--bytes] ?
>
> Suggestion to Shlomo (if you use bash):
>
> alias uniq='LC_ALL=C \uniq'
>
> or, if you want it in your shell scripts too:
>
> uniq() { LC_ALL=C; "${type -P uniq}" "$@" ; }; export -f uniq
>
>
>
> On 12/16/2013 9:33 AM, Pádraig Brady wrote:
>
>> tag 16168 notabug
>> close 16168
>> stop
>>
>> On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
>>
>>> Lines with CJK letters are deemed equal by length only, since the
>>> characters seem to be ignored.
>>> I understand this is due to locale.
>>> But, it would be nice if a simple flag would do a locale-free comparison
>>> (i.e. equal = all bytes are equal).
>>>
>>
>> If you want to compare byte by byte:
>>
>> LC_ALL=C uniq ....
>>
>> thanks,
>> Pǽdraig.
>>
>>
>>
>>
[Message part 2 (text/html, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 14 Jan 2014 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 164 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.