From unknown Sun Jun 22 11:47:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9569: =?UTF-8?Q?:upper:=C2=A0and?= :lower: not working with tr and utf8 characters Resent-From: Milos Sramek Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 21 Sep 2011 19:59:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 9569 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 9569@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.131663510213413 (code B ref -1); Wed, 21 Sep 2011 19:59:01 +0000 Received: (at submit) by debbugs.gnu.org; 21 Sep 2011 19:58:22 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R6Sw0-0003UG-QH for submit@debbugs.gnu.org; Wed, 21 Sep 2011 15:58:22 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1R6SmI-0003Fp-89 for submit@debbugs.gnu.org; Wed, 21 Sep 2011 15:48:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R6Sm1-0001U9-36 for submit@debbugs.gnu.org; Wed, 21 Sep 2011 15:48:01 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, T_DKIM_INVALID, T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:57728) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R6Sm1-0001U4-0L for submit@debbugs.gnu.org; Wed, 21 Sep 2011 15:48:01 -0400 Received: from eggs.gnu.org ([140.186.70.92]:50138) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R6Sm0-0000c8-8f for bug-coreutils@gnu.org; Wed, 21 Sep 2011 15:48:00 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1R6Slz-0001Ts-7I for bug-coreutils@gnu.org; Wed, 21 Sep 2011 15:48:00 -0400 Received: from mail-fx0-f41.google.com ([209.85.161.41]:36795) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1R6Sly-0001Tm-OW for bug-coreutils@gnu.org; Wed, 21 Sep 2011 15:47:59 -0400 Received: by fxh17 with SMTP id 17so2303888fxh.0 for ; Wed, 21 Sep 2011 12:47:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=oke+6AsuRNzb6A9S7o9t7xh9SVkQI4Kkjq0+MfWFZlg=; b=xfylY+Jrq1m/auRvoY9Le/QXD/qZRPwRE9OAmHou7O5ARj8zMHmMxF/ZUbaUoY2jug Qf7AZWulTQGaqtyTKFhpK+zIhJMKCKZ8+dvggyIKgWAeI45nr0u+bGmXbjfvAlGERUQr 0swTzLxCh69MGhjHFBSqGHKhb19uDd657yJek= Received: by 10.223.25.208 with SMTP id a16mr1691468fac.19.1316634477008; Wed, 21 Sep 2011 12:47:57 -0700 (PDT) Received: from [192.168.1.13] (adsl-195-168-244-235.dynamic.nextra.sk. [195.168.244.235]) by mx.google.com with ESMTPS id e17sm5174794fae.17.2011.09.21.12.47.55 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 21 Sep 2011 12:47:56 -0700 (PDT) Message-ID: <4E7A3F6B.6000207@gmail.com> Date: Wed, 21 Sep 2011 21:47:55 +0200 From: Milos Sramek User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.21) Gecko/20110831 Thunderbird/3.1.13 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -5.9 (-----) X-Mailman-Approved-At: Wed, 21 Sep 2011 15:58:19 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) Hello, there seems to be a bug in tr: utf8 characters are not converted using :upper: and :lower: For example: $ echo lľsšcčtťzž | tr [:lower:] [:upper:] LľSšCčTťZž AWK does it correctly $ echo lľsšcčtťzž | awk '{ print toupper($0) }' LĽSŠCČTŤZŽ Used system: Ubuntu 10.10, my locale settings are $ env locale LANG=sk_SK.utf8 LC_CTYPE="sk_SK.utf8" LC_NUMERIC=en_US.utf8 LC_TIME="sk_SK.utf8" LC_COLLATE="sk_SK.utf8" LC_MONETARY="sk_SK.utf8" LC_MESSAGES="sk_SK.utf8" LC_PAPER="sk_SK.utf8" LC_NAME="sk_SK.utf8" LC_ADDRESS="sk_SK.utf8" LC_TELEPHONE="sk_SK.utf8" LC_MEASUREMENT="sk_SK.utf8" LC_IDENTIFICATION="sk_SK.utf8" LC_ALL= Observed on other systems, too. thank you Milos -- email & jabber: sramek.milos@gmail.com From unknown Sun Jun 22 11:47:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9569: Duplicate of 9365 References: <4E7A3F6B.6000207@gmail.com> In-Reply-To: <4E7A3F6B.6000207@gmail.com> Resent-From: "Marton Kadar" Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 24 Feb 2012 17:31:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 9569 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 9569@debbugs.gnu.org Received: via spool by 9569-submit@debbugs.gnu.org id=B9569.133010461816863 (code B ref 9569); Fri, 24 Feb 2012 17:31:03 +0000 Received: (at 9569) by debbugs.gnu.org; 24 Feb 2012 17:30:18 +0000 Received: from localhost ([127.0.0.1]:54416 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1S0yyH-0004Nr-SY for submit@debbugs.gnu.org; Fri, 24 Feb 2012 12:30:18 -0500 Received: from mailout-us.gmx.com ([74.208.5.67]:33164 helo=mailout-us.mail.com) by debbugs.gnu.org with smtp (Exim 4.72) (envelope-from ) id 1S0wnR-0000iZ-9O for 9569@debbugs.gnu.org; Fri, 24 Feb 2012 10:10:58 -0500 Received: (qmail 21270 invoked by uid 0); 24 Feb 2012 15:08:16 -0000 Received: from 145.236.252.34 by rms-us015 with HTTP Content-Type: text/plain; charset="utf-8" Date: Fri, 24 Feb 2012 10:08:15 -0500 From: "Marton Kadar" Message-ID: <20120224150815.107140@gmx.com> MIME-Version: 1.0 X-Authenticated: #77717673 X-Flags: 0001 X-Mailer: GMX.com Web Mailer x-registered: 0 Content-Transfer-Encoding: 8bit X-GMX-UID: iMI1b75I3zOlOMiDynAh16p+IGRvb4Dj X-Spam-Score: -1.9 (-) X-Mailman-Approved-At: Fri, 24 Feb 2012 12:30:15 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) This is the same error as http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9365 although it gives some examples too. From debbugs-submit-bounces@debbugs.gnu.org Sat Sep 15 06:30:00 2012 Received: (at control) by debbugs.gnu.org; 15 Sep 2012 10:30:00 +0000 Received: from localhost ([127.0.0.1]:34898 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TCpdQ-0008V1-6j for submit@debbugs.gnu.org; Sat, 15 Sep 2012 06:30:00 -0400 Received: from mx.meyering.net ([88.168.87.75]:49067) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TCpdN-0008Un-W7; Sat, 15 Sep 2012 06:29:58 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id E49B7601F7; Sat, 15 Sep 2012 12:28:54 +0200 (CEST) From: Jim Meyering To: Michael Stummvoll Subject: Re: bug#12192: tr - bytes vs characters In-Reply-To: <20120813145222.0450a1a8@eddie> (Michael Stummvoll's message of "Mon, 13 Aug 2012 14:52:22 +0200") References: <20120813145222.0450a1a8@eddie> Date: Sat, 15 Sep 2012 12:28:54 +0200 Message-ID: <87wqzvvau1.fsf@rho.meyering.net> Lines: 37 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.4 (--) X-Debbugs-Envelope-To: control Cc: 12192@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.4 (--) forcemerge 12192 9365 thanks Michael Stummvoll wrote: > Hi gnu folks, > > as already known, tr cannot handle multibyte-encodings like utf-8: > >> mst@eddie:~$ echo "foo" | tr o =F6 >> f=C3=C3 > > i know, that multibyte encoding support is not needed for > posix-compilance, BUT: > > the manpage of tr says the following: > >> Translate, squeeze, and/or delete characters from standard input, >> writing to standard output. > > and thats the inconsistence imho. > > The typical interpretation of "character" in such a context means one > character on display. regardless which encoding is used or how many > bytes are used to display this. So, if tr realy translates "characters" > it should preserve the encoding. If it doesn't do, it does not > translate "characters" but "bytes". So there I see two ways: > > - add multybyte-encoding support to tr > or > - change the manpage and helptext to not say "characters" but "bytes" > > since it doesn't seem that somebody want to add the support to tr, an > update of the manpage would be the easier way to ensure the consistence. Thanks for the report. I'm merging this issue with the others that relate to tr and multi-byte support. From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 06 07:23:25 2013 Received: (at control) by debbugs.gnu.org; 6 Jan 2013 12:23:25 +0000 Received: from localhost ([127.0.0.1]:47129 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TrpG9-0005xs-64 for submit@debbugs.gnu.org; Sun, 06 Jan 2013 07:23:25 -0500 Received: from mx1.redhat.com ([209.132.183.28]:20124) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TrpG4-0005xb-KS; Sun, 06 Jan 2013 07:23:21 -0500 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r06CN0bp025753 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 6 Jan 2013 07:23:01 -0500 Received: from [10.36.116.39] (ovpn-116-39.ams2.redhat.com [10.36.116.39]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r06CMvfH030424 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sun, 6 Jan 2013 07:22:59 -0500 Message-ID: <50E96CA0.4030802@draigBrady.com> Date: Sun, 06 Jan 2013 12:22:56 +0000 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 MIME-Version: 1.0 To: Urs Thuermann Subject: Re: bug#13362: tr does not work with UTF-8 locales References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r06CN0bp025753 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: control Cc: 13362@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) forcemerge 13362 9365 thanks On 01/05/2013 11:53 AM, Urs Thuermann wrote: > The tr utility from coreutils-8.20 does not handle multi-byte > characters in UTF-8 correctly. It seems the arguments and standard > input are read byte-by-byte instead of character-by-character. We all agree that this is an issue. Someone just needs to get the time to implement it. thanks, P=E1draig. From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 15 10:06:07 2018 Received: (at control) by debbugs.gnu.org; 15 Oct 2018 14:06:07 +0000 Received: from localhost ([127.0.0.1]:50838 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gC3VX-0006u9-Ec for submit@debbugs.gnu.org; Mon, 15 Oct 2018 10:06:07 -0400 Received: from mail-pf1-f181.google.com ([209.85.210.181]:45055) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gC3VV-0006tg-UX for control@debbugs.gnu.org; Mon, 15 Oct 2018 10:06:06 -0400 Received: by mail-pf1-f181.google.com with SMTP id r9-v6so9729297pff.11 for ; Mon, 15 Oct 2018 07:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=tslEaETexaL5Plma/unGTsX6EP4DKD4xeOJEPvCMxto=; b=IgYFBuphZeO6ufK2R0ThohvlgQjuhoNFWS1LlcCSo1fBUKysCs/zlabxiMQtbAEcv9 5lGZ2r6pLOmV6xqrPS+qDt2TeNf9PdqM87PuEWn/4x620EkidrGhTUDmK8QpzgVQzVAQ EbQlLs3N5jp0ConLqnjfIhq7aDCn8Xl50ynhkCdy0fANLzrY2EvRmD3m7uMjb5jb+xqH xS/hKhgbCyjDaI3jvxizDo6EABMqsRi+3b0sr7dxdtRTv6KUKh7gQ5PU0nHQE8YeQONT P9HPFS0vFx2QnpJxGkZmXHf+3XGL8HKXvVuDTlSNkqF/qZTLVqCRMbMCWIvHt6+ZEs5y n3hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=tslEaETexaL5Plma/unGTsX6EP4DKD4xeOJEPvCMxto=; b=BN3r7OXN8tmVFFClh/JaAHUbBcnAdSXiJ67IUn+im2fR1JlCkyNstmy7CewveTl3rI vnSXQ19OdA6Qp1kJFBuaGcoNBRzwMYw41tD8vZGNs6p+KVkRqseq9FvSPMt0jSjTDPMX mTl7/OUUicxSa42ymRGVWKwy+eldyLKFF2WgL6mhMSZZXaqgQHErZbsyshfA9Tm1AP9n m5S7nzWYoRABBjQeUfpMZUOTVocTfC2NLLWQ7t3voQlPJH4k+FeCZC+MqQN/p+8OPxzU kFK3LSse0N4xr5hs4MNgeUfpxgSUlJWXlkBAjD74LRDzK9UzaXZnfGEpez1iMWr9MNLu oS0A== X-Gm-Message-State: ABuFfogF5FahDcLajf6++HIxsfi3VRUE8JU/09QQUb42o/IqgSydVRRq 3Vo6uKUbR8M+YV87LQ8Js2Dg6zzjDqI= X-Google-Smtp-Source: ACcGV630bCAJkl8c383Nvxde4mlHBbauHW61q0zd37Y2dEOjIhXh0u6sXrtKgN3rwQfCYT0iTqNkyg== X-Received: by 2002:a62:67c3:: with SMTP id t64-v6mr10225425pfj.76.1539612359472; Mon, 15 Oct 2018 07:05:59 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id h77-v6sm21227916pfh.13.2018.10.15.07.05.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Oct 2018 07:05:57 -0700 (PDT) To: control@debbugs.gnu.org From: Assaf Gordon Message-ID: Date: Mon, 15 Oct 2018 08:05:56 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: severity 9365 wishlist retitle 9365 multibyte: tr: TR operates on bytes, not characters retitle 9446 cp: acl preservation problem on FreeBSD 8.1 severity 9472 wishlist [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.210.181 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.210.181 listed in list.dnswl.org] 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) severity 9365 wishlist retitle 9365 multibyte: tr: TR operates on bytes, not characters retitle 9446 cp: acl preservation problem on FreeBSD 8.1 severity 9472 wishlist