GNU bug report logs -
#9252
cut does not yet support unicode characters
Previous Next
Reported by: Danilo Moraes <moraesdno <at> gmail.com>
Date: Sat, 6 Aug 2011 01:54:06 UTC
Severity: normal
Tags: notabug
Merged with 9253
Done: Bob Proulx <bob <at> proulx.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9252 in the body.
You can then email your comments to 9252 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#9252
; Package
coreutils
.
(Sat, 06 Aug 2011 01:54:06 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Danilo Moraes <moraesdno <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Sat, 06 Aug 2011 01:54:06 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I have found a little bug (i guess). See that:
a=danilo
echo $a | cut -c -5 # shows danil
a=dánilo
echo $a | cut -c 5 # shows dáni
The option -b equal works. The cut is ignoring the letters with acentuation.
I read in infopages this:
`-c CHARACTER-LIST'
`--characters=CHARACTER-LIST'
Select for printing only the characters in positions listed in
CHARACTER-LIST. The same as `-b' for now, but
internationalization will change that. Tabs and backspaces are
treated like any other character; they take up 1 character. If an
output delimiter is specified, (see the description of
`--output-delimiter'), then output that string between ranges of
selected bytes.
"The same as `-b' for now, but
internationalization will change that." this solves my problem? How it
works?
Thanks,
Danilo S. Morães
[Message part 2 (text/html, inline)]
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#9252
; Package
coreutils
.
(Sat, 06 Aug 2011 17:21:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 9252 <at> debbugs.gnu.org (full text, mbox):
forcemerge 9252 9253
retitle 9252 cut does not yet support unicode characters
tags 9252 + notabug
close 9252
thanks
Danilo Moraes wrote:
> I have found a little bug (i guess). See that:
Thank you for the report. You have discovered that coreutils does not
yet have localization support for wide characters.
> a=danilo
> echo $a | cut -c -5 # shows danil
$ echo "danilo" | od -tx1 -c
0000000 64 61 6e 69 6c 6f 0a
d a n I l o \n
> a=dánilo
> echo $a | cut -c 5 # shows dáni
I think you meant "cut -c-5" there.
$ echo "dánilo" | od -tx1 -c
0000000 64 c3 a1 6e 69 6c 6f 0a
d 303 241 n I l o \n
As you can see accented characters are not simple single byte
characters. The od output shows their byte values. The accented 'a'
occupies two bytes wide. This is why cut is counting it as two bytes.
> The option -b equal works. The cut is ignoring the letters with acentuation.
Sorry but that code has not yet been written.
> I read in infopages this:
Thank you for consulting the documentation! And I say that
seriously. So many people ignore it. It is pleasant to hear that you
read it.
> `-c CHARACTER-LIST'
> `--characters=CHARACTER-LIST'
> Select for printing only the characters in positions listed in
> CHARACTER-LIST. The same as `-b' for now, but
> internationalization will change that. Tabs and backspaces are
> treated like any other character; they take up 1 character. If an
> output delimiter is specified, (see the description of
> `--output-delimiter'), then output that string between ranges of
> selected bytes.
>
> "The same as `-b' for now, but
> internationalization will change that." this solves my problem? How it
> works?
Note that it says "internationalization /will/ change that" which
means will change it in the future. It is a future tense assertion.
It has not yet happened. In the future when the code is written and
put into coreutils then it will do this other behavior.
Note that some software distributions have patches that add unicode
support to the coreutils. But so far none of those patches have been
deemed appropriate to install in the upstream source due to issues of
maintainability due to issues such as code duplication and such.
Because this is not a bug in cut and is also a well known issue I am
going to go ahead and close the report. But that does not mean no
further discussion is possible. Please feel free to respond.
Discussion may still continue and is encouraged.
Bob
Forcibly Merged 9252 9253.
Request was from
Bob Proulx <bob <at> proulx.com>
to
control <at> debbugs.gnu.org
.
(Sat, 06 Aug 2011 17:21:02 GMT)
Full text and
rfc822 format available.
Changed bug title to 'cut does not yet support unicode characters' from 'a bug in cut'
Request was from
Bob Proulx <bob <at> proulx.com>
to
control <at> debbugs.gnu.org
.
(Sat, 06 Aug 2011 17:21:02 GMT)
Full text and
rfc822 format available.
Added tag(s) notabug.
Request was from
Bob Proulx <bob <at> proulx.com>
to
control <at> debbugs.gnu.org
.
(Sat, 06 Aug 2011 17:21:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
9252 <at> debbugs.gnu.org and Danilo Moraes <moraesdno <at> gmail.com>
Request was from
Bob Proulx <bob <at> proulx.com>
to
control <at> debbugs.gnu.org
.
(Sat, 06 Aug 2011 17:21:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#9252
; Package
coreutils
.
(Sat, 06 Aug 2011 20:21:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 9252 <at> debbugs.gnu.org (full text, mbox):
Danilo,
> Thanks for replying so quickly. Now I understand what cut was doing with my
> string. :)
> I'm braziliam and my english is very, very weak.
>
> > Note that it says "internationalization /will/ change that" which
> > means will change it in the future. It is a future tense assertion.
>
> This is the prove. I read but did not pay attention to the will. hehe
>
> More one time, thanks for replying.
Happy to help!
Bob
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 04 Sep 2011 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 13 years and 293 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.