From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 01 20:53:04 2015 Received: (at submit) by debbugs.gnu.org; 2 Sep 2015 00:53:04 +0000 Received: from localhost ([127.0.0.1]:45508 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZWwIR-0003zM-1m for submit@debbugs.gnu.org; Tue, 01 Sep 2015 20:53:04 -0400 Received: from eggs.gnu.org ([208.118.235.92]:44287) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZWw7t-0003kF-0j for submit@debbugs.gnu.org; Tue, 01 Sep 2015 20:42:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZWw7r-00068K-D3 for submit@debbugs.gnu.org; Tue, 01 Sep 2015 20:42:08 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: ** X-Spam-Status: No, score=2.9 required=5.0 tests=BAYES_50,FORGED_YAHOO_RCVD, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO_END_DIGIT, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:55269) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWw7r-00068G-AC for submit@debbugs.gnu.org; Tue, 01 Sep 2015 20:42:07 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54290) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWw7p-000146-VM for bug-coreutils@gnu.org; Tue, 01 Sep 2015 20:42:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZWw7l-0005xn-S8 for bug-coreutils@gnu.org; Tue, 01 Sep 2015 20:42:05 -0400 Received: from nm48-vm1.bullet.mail.bf1.yahoo.com ([216.109.115.156]:34020) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWw7l-0005uk-NH for bug-coreutils@gnu.org; Tue, 01 Sep 2015 20:42:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1441154520; bh=DAntNfukD+3Xc1qh34idydtn9otK5sXsrE6J6+7aHf0=; h=Date:From:Reply-To:To:Subject:From:Subject; b=QsGe3OLzA+7togWYDs+jhfhfMA+tVX7BNqnmqTTsgCPHQPMVhSFO6TXhfIbd7Dx1shZEHaaRwA+6ApfLz5L427n08YBBsv/GeoZ8oZFI+yDz/5Rdu0WPoodeWb1pTI/DlrNozDYCweRXinenWNjHKVOLHTgN6Cw5zPo7mHysQ0ulOD4wetoVpWoARfKitVHx5Bn2v/zN7EBzyKAkuNSgyWrxpW34JIswlfUQq2+DfwkE8LDwYDjNOSpc0btOy0A5uRhGbZMVmL/p2ltXbMFQyZPZ736Xm5eFeQxjHcwwTC+qxTTwyyKjwpk9NrebDOLzHThc4nCoiWSO/drloYrLDw== Received: from [98.139.215.142] by nm48.bullet.mail.bf1.yahoo.com with NNFMP; 02 Sep 2015 00:42:00 -0000 Received: from [98.139.212.200] by tm13.bullet.mail.bf1.yahoo.com with NNFMP; 02 Sep 2015 00:42:00 -0000 Received: from [127.0.0.1] by omp1009.mail.bf1.yahoo.com with NNFMP; 02 Sep 2015 00:42:00 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 745314.72874.bm@omp1009.mail.bf1.yahoo.com X-YMail-OSG: vXoYRM8VM1l1ygK3PVatHhWIaM4x9lwh4EhpQp1vDqrCivI6p7q4Uei41KtbH0G 0eF9x02b.ryiEC6kl5KOZOOJOAyeHf5D7cFGdnl0ivphN1.R6.yx4tmXzxDaJ4P.DphswFEkfvSt vEBOZhbwBhiipiWuC1LlOF9V4xNBMfK1ApDwAvq2IwpvQixzXILEBaeYArJheXl6yRQipYcYi8Ko Zcgtrb5zJgBdWCZzUEV5vMXVSuByiDBxxEyMVif4s5MWvQTwcpX3WZBR1IcQ8HtPcPZIBEU5HgTf hoPA82HftKEhijhpx2.M130iGJle4BXqT4uRcb0cfH6ZhHQAcXVYvMR4_S_WwiIQhWFyexaI5eEq 1qXKOWUzSZd_Cmgf7nG2PNr7lb.x2wFs3D4rSlEI1woXAiXARIN8cFbtphRkPsaNWdbtNODv1TH. SAoYhLCmE8n5zUFFKV0DdtjaIqcAEkRDki.B9CcWLxCiVx8Gh9dj4dDPREbvM8CJ8rCWKreBkgEp cXiylWd7xlx8W Received: by 66.196.80.145; Wed, 02 Sep 2015 00:42:00 +0000 Date: Wed, 2 Sep 2015 00:41:09 +0000 (UTC) From: Michael Lee To: "bug-coreutils@gnu.org" Message-ID: <1569154567.83126.1441154469654.JavaMail.yahoo@mail.yahoo.com> Subject: Bug with cut and Spanish characters from text file with UTF-8 encoding MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_83125_1843901992.1441154469646" Content-Length: 7153 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -2.8 (--) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 01 Sep 2015 20:53:01 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Michael Lee List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.8 (--) ------=_Part_83125_1843901992.1441154469646 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable To whom it may concern: To preface the explanation of this possible bug, the following was tested: Encoding(s) was/were determined by opening the Spanish text files with vi a= nd using ":set" to view the encoding type(s). Text files containing Spanish letters/characters were used in this test.=C2= =A0 First, the locale in the bash shell was set to UTF-8 (default setting w= ith Ubuntu) and the encoding on the first test file was encoded with Latin1= .=C2=A0 Under these conditions head and tail were used to try to output sev= eral Spanish letters/characters with accents above the letter.=C2=A0 Trying= to use "head spanish.txt" and "tail spanish.txt" resulted in output with s= paces in place of the Spanish letters/characters. After spanish.txt was converted from Latin1 to UTF-8 with iconv, the test w= as repeated with the head and tail utilities and then the output was correc= t.=C2=A0 The Spanish letters/characters then displayed correctly instead of= what previously appeared to be blank spaces.=C2=A0 When the "cut" command = was added to this, the behavior of spaces taking the place of letters retur= ned. For example, "head -n 50 spanish.txt | cut -c 1" or "tail -n 50 spanish.txt= | cut -c 1" will result in the first character showing only blank spaces w= here there are Spanish letters/characters.=C2=A0 Letters with accents are d= isplayed as blank spaces.=C2=A0 Using only head or tail will show the Spani= sh letters correctly, but not with the cut command. When using cut as, "cut -c 1" with a text file with Spanish characters, it = does not display those characters. For example, the character =C3=A3 or =C3=A1 will not display if it is the f= irst character and the file is trimmed using the cut command. Converting the file from Latin1 to UTF-8 solved the problem with head and t= ail, but not cut. The cut command does not seem to output the special letters/characters corr= ectly. Is there an environment variable that could fix this or could it possibly b= e a bug? Thank you for your time. Sincerely,Michael Lee =20 ------=_Part_83125_1843901992.1441154469646 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
To whom it may co= ncern:

To preface the explanation of this pos= sible bug, the following was tested:

Encoding(s) was/were determined by opening the Spanish text files with vi= and using ":set" to view the encoding type(s).

Text files containing Spanish letters/characte= rs were used in this test.  First, the locale in the bash shell was se= t to UTF-8 (default setting with Ubuntu) and the encoding on the first test= file was encoded with Latin1.  Under these conditions head and tail w= ere used to try to output several Spanish letters/characters with accents a= bove the letter.  Trying to use "head spanish.txt" and "tail spanish.t= xt" resulted in output with spaces in place of the Spanish letters/characte= rs.

=
After spanish.txt w= as converted from Latin1 to UTF-8 with iconv, the test was repeated with th= e head and tail utilities and then the output was correct.  The Spanis= h letters/characters then displayed correctly instead of what previously ap= peared to be blank spaces.  When the "cut" command was added to this, = the behavior of spaces taking the place of letters returned.

For example, "head -n 50 spanish.tx= t | cut -c 1" or "tail -n 50 spanish.txt | cut -c 1" will result in the fir= st character showing only blank spaces where there are Spanish letters/char= acters.  Letters with accents are displayed as blank spaces.  Usi= ng only head or tail will show the Spanish letters correctly, but not with = the cut command.

<= /div>
When using cut as, "cut -c= 1" with a text file with Spanish characters, it does not display those cha= racters.

For example, the character =C3=A3 or =C3= =A1 will not display if it is the first character and the file is trimmed u= sing the cut command.

Converting the file from Latin1 = to UTF-8 solved the problem with head and tail, but not cut.

The cut command does not seem to output the special letters/charact= ers correctly.

Is there an environment variable that co= uld fix this or could it possibly be a bug?

Thank you f= or your time.

Sincerely,
Michael Lee
=20 =09 =09 =09 =09

------=_Part_83125_1843901992.1441154469646-- From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 02 07:03:15 2015 Received: (at 21395) by debbugs.gnu.org; 2 Sep 2015 11:03:15 +0000 Received: from localhost ([127.0.0.1]:45983 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZX5ox-0004eu-Eq for submit@debbugs.gnu.org; Wed, 02 Sep 2015 07:03:15 -0400 Received: from mail2.vodafone.ie ([213.233.128.44]:37072) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1ZX5ov-0004el-9s for 21395@debbugs.gnu.org; Wed, 02 Sep 2015 07:03:13 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ag8OAFPW5lVtT8J4/2dsb2JhbABdgklSHzVqgT+BFU68JYV4AQICgTpMAQEBAQEBgQtBA4NgAQEEIw8BQRULDQsCAgUWCwICCQMCAQIBRQYBDAgBAQWIKQEItQWFb48cLIEihFaFdoUSgmmBQwWVSZYFkWAmgkGBPz2DAAEBAQ Received: from unknown (HELO localhost.localdomain) ([109.79.194.120]) by mail2.vodafone.ie with ESMTP; 02 Sep 2015 12:03:11 +0100 Message-ID: <55E6D76E.5070009@draigBrady.com> Date: Wed, 02 Sep 2015 12:03:10 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Michael Lee , 21395@debbugs.gnu.org Subject: Re: bug#21395: Bug with cut and Spanish characters from text file with UTF-8 encoding References: <1569154567.83126.1441154469654.JavaMail.yahoo@mail.yahoo.com> In-Reply-To: <1569154567.83126.1441154469654.JavaMail.yahoo@mail.yahoo.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 21395 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 02/09/15 01:41, Michael Lee wrote: > When using cut as, "cut -c 1" with a text file with Spanish characters, it does not display those characters. > For example, the character ã or á will not display if it is the first character and the file is trimmed using the cut command. Debian/Ubuntu do not use the i18n patch used in Fedora/RHEL/Suse for example, and so do not support multi-byte characters. Now that i18n patch is problematic and incomplete, and there are plans to bring the functionality upstream at some stage: http://www.pixelbeat.org/docs/coreutils_i18n/ cheers, Pádraig From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 24 17:10:45 2018 Received: (at control) by debbugs.gnu.org; 24 Oct 2018 21:10:45 +0000 Received: from localhost ([127.0.0.1]:40647 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gFQQO-0004Bc-Tr for submit@debbugs.gnu.org; Wed, 24 Oct 2018 17:10:45 -0400 Received: from mail-pl1-f172.google.com ([209.85.214.172]:45838) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gFQQO-0004BQ-3h for control@debbugs.gnu.org; Wed, 24 Oct 2018 17:10:44 -0400 Received: by mail-pl1-f172.google.com with SMTP id o19-v6so2797437pll.12 for ; Wed, 24 Oct 2018 14:10:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=v3BouO1SaPxzcjewYAkMh9uFPJvRQzOe5vj4pBk6sms=; b=IJ4l/mguIfXta2N6qG/5k0qFMLVRjvJfRduTrKcdCZWop+Y5e+SOqepuuVBr4Tc/sn SvCAW3VEvY/7lAgWZeXWsCk5/gSFIirRDi8Mjwy0+O647bhSpul6LcGagwSVcneNFtat /KxbXJq+W2iPv6M1pZCEr/t2sVuxedHeg6Hr3pEp2UwBYctt2KN2ykEhfAz5GVayYeLn wWax5w9UvEYabpxaLIrtP3ScRwtvE3LQW2qbfF7Tfo5eFRYj6tLnUnhyGBo5i8Xz0qiV D5lOpaCprw9hiQTTLt+B4+MLz5U9b8KRXNjG7n9PTmAIOtoDz41vp303ZuzIgAVo1HSJ te7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=v3BouO1SaPxzcjewYAkMh9uFPJvRQzOe5vj4pBk6sms=; b=BIbv90g00nwVELDkfSGHGc3aCJCHrWqYlmgrKfSxE59ef+d4iGSqBLDy6JeiHsNr7q CVTw+hRjnf5DeUFmShTsfL4zIdyJTxsMkoC8y3WK/w7YXLvA56nvzVttO320mDmSbdq6 bsnpnQgfP+cKnDG0NlH1M5tqVVip6SwavRqNqcChZy42VzHODcK8cV5yoZP2CSBxE6Nz sK6jndoipwVN6EGr+wZydb6XGuBacL/uMPi8KyGepjmuZ7JE6wyNeoKYvDi7zVug4SFB WA+WcmVqcb+bPosVv2MNg637EyLqSQCeMhaQvp+hACRyPWZrWBEtpE3msmqLyJQ69E3x UbMQ== X-Gm-Message-State: AGRZ1gLumNYmDnmcCvHywx7vfyetZ2hyAqBUwdDwdkHoJ9K7wh1SV/NP qchhJXoanAQAijCRzRV4hyqXfzBY7G0= X-Google-Smtp-Source: AJdET5dbKrpU6hlekYehawG1exyQd3j9WMBX4X5ZzKIuj+5nTbkUSmnSxmb+7i7c6PUxoi0iFRl4Zw== X-Received: by 2002:a17:902:850b:: with SMTP id bj11-v6mr4005960plb.107.1540415437709; Wed, 24 Oct 2018 14:10:37 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id a16-v6sm11108684pgb.6.2018.10.24.14.10.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 24 Oct 2018 14:10:36 -0700 (PDT) To: control@debbugs.gnu.org From: Assaf Gordon Message-ID: Date: Wed, 24 Oct 2018 15:10:35 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: severity 21395 wishlist retitle 21395 multibyte: cut and Spanish characters [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.214.172 listed in list.dnswl.org] -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.85.214.172 listed in wl.mailspike.net] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) severity 21395 wishlist retitle 21395 multibyte: cut and Spanish characters