From unknown Sat Jun 21 05:18:52 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9252: a bug in cut Resent-From: Danilo Moraes Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 06 Aug 2011 01:54:06 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 9252 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 9252@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.131259559124179 (code B ref -1); Sat, 06 Aug 2011 01:54:06 +0000 Received: (at submit) by debbugs.gnu.org; 6 Aug 2011 01:53:11 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QpW4d-0006Hq-GH for submit@debbugs.gnu.org; Fri, 05 Aug 2011 21:53:11 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QpMVM-0008Ut-Dx for submit@debbugs.gnu.org; Fri, 05 Aug 2011 11:40:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QpMUa-00028f-UU for submit@debbugs.gnu.org; Fri, 05 Aug 2011 11:39:22 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_LOW,T_DKIM_INVALID,T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:41940) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QpMUa-00028b-St for submit@debbugs.gnu.org; Fri, 05 Aug 2011 11:39:20 -0400 Received: from eggs.gnu.org ([140.186.70.92]:33545) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QpMUZ-0007rg-Vv for bug-coreutils@gnu.org; Fri, 05 Aug 2011 11:39:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QpMUY-000287-Mb for bug-coreutils@gnu.org; Fri, 05 Aug 2011 11:39:19 -0400 Received: from mail-gw0-f41.google.com ([74.125.83.41]:56085) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QpMUY-00027U-3u for bug-coreutils@gnu.org; Fri, 05 Aug 2011 11:39:18 -0400 Received: by gwaa20 with SMTP id a20so134709gwa.0 for ; Fri, 05 Aug 2011 08:39:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=LDSrUE4ythCyaq9JU7/iQqsIpE/RW+d93AiVKFAoTog=; b=JfzPWKGUN1jnQ7b5/C64FETIxeUrtI1RRAlfO1r/dB9wLe8tISnS4kagfwUvFBztMF mMPPobx79qMNpxJ5psfkj3kJD5VpM8wtwI5OKMQIx6gMEPbLRSLBSfEQUSZdsMlVtq2e km/89krTqw9BYrrXa69uw+YThU0U048LForj0= MIME-Version: 1.0 Received: by 10.236.185.229 with SMTP id u65mr2998111yhm.511.1312558754475; Fri, 05 Aug 2011 08:39:14 -0700 (PDT) Received: by 10.236.176.195 with HTTP; Fri, 5 Aug 2011 08:39:14 -0700 (PDT) Date: Fri, 5 Aug 2011 12:39:14 -0300 Message-ID: From: Danilo Moraes Content-Type: multipart/alternative; boundary=20cf303f6ac64eacff04a9c3e60b X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -5.9 (-----) X-Mailman-Approved-At: Fri, 05 Aug 2011 21:53:02 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) --20cf303f6ac64eacff04a9c3e60b Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I have found a little bug (i guess). See that: a=3Ddanilo echo $a | cut -c -5 # shows danil a=3Dd=E1nilo echo $a | cut -c 5 # shows d=E1ni The option -b equal works. The cut is ignoring the letters with acentuation= . I read in infopages this: `-c CHARACTER-LIST' `--characters=3DCHARACTER-LIST' Select for printing only the characters in positions listed in CHARACTER-LIST. The same as `-b' for now, but internationalization will change that. Tabs and backspaces are treated like any other character; they take up 1 character. If an output delimiter is specified, (see the description of `--output-delimiter'), then output that string between ranges of selected bytes. "The same as `-b' for now, but internationalization will change that." this solves my problem? How it works? Thanks, Danilo S. Mor=E3es --20cf303f6ac64eacff04a9c3e60b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I have found a little bug (i guess). See that:

a=3Ddanilo
echo $a | cut -c -5 # shows danil

a=3Dd=E1nilo
echo $a | cut -c 5 # shows d=E1ni
<= br>
The option -b equal works. The cut is ignoring the letters with acentu= ation.

I read in infopages this:=A0

=
`-c CHARACTER-LIST'
`--characters=3DCHARACTER-LIST= '
=A0 =A0 =A0Select for printing only the characters in positions listed= in
=A0 =A0 =A0CHARACTER-LIST. =A0The same as `-b' for now, b= ut
=A0 =A0 =A0internationalization will change that. =A0Tabs and = backspaces are
=A0 =A0 =A0treated like any other character; they take up 1 character.= =A0If an
=A0 =A0 =A0output delimiter is specified, (see the desc= ription of
=A0 =A0 =A0`--output-delimiter'), then output that= string between ranges of
=A0 =A0 =A0selected bytes.

"The same as = `-b' for now, but
=A0 =A0 =A0internationalization will change= that." this solves my problem? How it works?

Thanks,

Danilo S. Mor=E3es
--20cf303f6ac64eacff04a9c3e60b-- From unknown Sat Jun 21 05:18:52 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9252: a bug in cut Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 06 Aug 2011 17:21:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 9252 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Danilo Moraes Cc: 9252@debbugs.gnu.org Received: via spool by 9252-submit@debbugs.gnu.org id=B9252.131265120430011 (code B ref 9252); Sat, 06 Aug 2011 17:21:02 +0000 Received: (at 9252) by debbugs.gnu.org; 6 Aug 2011 17:20:04 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QpkXb-0007ny-PR for submit@debbugs.gnu.org; Sat, 06 Aug 2011 13:20:04 -0400 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QpkXX-0007nS-GN; Sat, 06 Aug 2011 13:20:01 -0400 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id B945A21361; Sat, 6 Aug 2011 11:19:06 -0600 (MDT) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 99A772DC71; Sat, 6 Aug 2011 11:19:06 -0600 (MDT) Date: Sat, 6 Aug 2011 11:19:06 -0600 From: Bob Proulx Message-ID: <20110806171906.GB16380@hysteria.proulx.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.5 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.5 (--) forcemerge 9252 9253 retitle 9252 cut does not yet support unicode characters tags 9252 + notabug close 9252 thanks Danilo Moraes wrote: > I have found a little bug (i guess). See that: Thank you for the report. You have discovered that coreutils does not yet have localization support for wide characters. > a=3Ddanilo > echo $a | cut -c -5 # shows danil $ echo "danilo" | od -tx1 -c 0000000 64 61 6e 69 6c 6f 0a d a n I l o \n > a=3Dd=E1nilo > echo $a | cut -c 5 # shows d=E1ni I think you meant "cut -c-5" there. $ echo "d=E1nilo" | od -tx1 -c 0000000 64 c3 a1 6e 69 6c 6f 0a d 303 241 n I l o \n As you can see accented characters are not simple single byte characters. The od output shows their byte values. The accented 'a' occupies two bytes wide. This is why cut is counting it as two bytes. > The option -b equal works. The cut is ignoring the letters with acentua= tion. Sorry but that code has not yet been written. > I read in infopages this: Thank you for consulting the documentation! And I say that seriously. So many people ignore it. It is pleasant to hear that you read it. > `-c CHARACTER-LIST' > `--characters=3DCHARACTER-LIST' > Select for printing only the characters in positions listed in > CHARACTER-LIST. The same as `-b' for now, but > internationalization will change that. Tabs and backspaces are > treated like any other character; they take up 1 character. If an > output delimiter is specified, (see the description of > `--output-delimiter'), then output that string between ranges of > selected bytes. >=20 > "The same as `-b' for now, but > internationalization will change that." this solves my problem? Ho= w it > works? Note that it says "internationalization /will/ change that" which means will change it in the future. It is a future tense assertion. It has not yet happened. In the future when the code is written and put into coreutils then it will do this other behavior. Note that some software distributions have patches that add unicode support to the coreutils. But so far none of those patches have been deemed appropriate to install in the upstream source due to issues of maintainability due to issues such as code duplication and such. Because this is not a bug in cut and is also a well known issue I am going to go ahead and close the report. But that does not mean no further discussion is possible. Please feel free to respond. Discussion may still continue and is encouraged. Bob From unknown Sat Jun 21 05:18:52 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9252: a bug in cut Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 06 Aug 2011 20:21:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 9252 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: notabug To: Danilo Moraes Cc: 9252@debbugs.gnu.org Received: via spool by 9252-submit@debbugs.gnu.org id=B9252.131266200619687 (code B ref 9252); Sat, 06 Aug 2011 20:21:02 +0000 Received: (at 9252) by debbugs.gnu.org; 6 Aug 2011 20:20:06 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QpnLq-00057U-2u for submit@debbugs.gnu.org; Sat, 06 Aug 2011 16:20:06 -0400 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QpnLn-000577-7P for 9252@debbugs.gnu.org; Sat, 06 Aug 2011 16:20:04 -0400 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id A289421361; Sat, 6 Aug 2011 14:19:08 -0600 (MDT) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 832D42DC71; Sat, 6 Aug 2011 14:19:08 -0600 (MDT) Date: Sat, 6 Aug 2011 14:19:08 -0600 From: Bob Proulx Message-ID: <20110806201908.GA26137@hysteria.proulx.com> References: <20110806171906.GB16380@hysteria.proulx.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -2.5 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.5 (--) Danilo, > Thanks for replying so quickly. Now I understand what cut was doing with my > string. :) > I'm braziliam and my english is very, very weak. > > > Note that it says "internationalization /will/ change that" which > > means will change it in the future. It is a future tense assertion. > > This is the prove. I read but did not pay attention to the will. hehe > > More one time, thanks for replying. Happy to help! Bob