From unknown Tue Jun 17 20:18:28 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#13362 <13362@debbugs.gnu.org> To: bug#13362 <13362@debbugs.gnu.org> Subject: Status: multibyte: tr: TR operates on bytes, not characters Reply-To: bug#13362 <13362@debbugs.gnu.org> Date: Wed, 18 Jun 2025 03:18:28 +0000 retitle 13362 multibyte: tr: TR operates on bytes, not characters reassign 13362 coreutils submitter 13362 Urs Thuermann severity 13362 wishlist thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 05 12:27:22 2013 Received: (at submit) by debbugs.gnu.org; 5 Jan 2013 17:27:22 +0000 Received: from localhost ([127.0.0.1]:45049 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TrXWi-00078R-Vm for submit@debbugs.gnu.org; Sat, 05 Jan 2013 12:27:22 -0500 Received: from eggs.gnu.org ([208.118.235.92]:60369) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TrSfT-00038w-QY for submit@debbugs.gnu.org; Sat, 05 Jan 2013 07:16:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TrSfG-0003yk-9P for submit@debbugs.gnu.org; Sat, 05 Jan 2013 07:15:51 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-101.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD, USER_IN_WHITELIST autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:43509) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TrSfG-0003yg-6r for submit@debbugs.gnu.org; Sat, 05 Jan 2013 07:15:50 -0500 Received: from eggs.gnu.org ([208.118.235.92]:43494) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TrSfF-0001ei-AI for bug-coreutils@gnu.org; Sat, 05 Jan 2013 07:15:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TrSfE-0003yM-0f for bug-coreutils@gnu.org; Sat, 05 Jan 2013 07:15:49 -0500 Received: from oker.escape.de ([2a00:1030:1004:107::2]:58315) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TrSfD-0003y5-Kw for bug-coreutils@gnu.org; Sat, 05 Jan 2013 07:15:47 -0500 Received: from oker.escape.de (localhost [127.0.0.1]) (envelope-sender: urs@isnogud.escape.de) by oker.escape.de (8.14.3/8.14.3/$Revision: 1.76 $) with ESMTP id r05Bt3uE024607 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sat, 5 Jan 2013 12:55:03 +0100 Received: (from uucp@localhost) by oker.escape.de (8.14.3/8.14.3/Submit) with UUCP id r05Bt3Cb024593 for bug-coreutils@gnu.org; Sat, 5 Jan 2013 12:55:03 +0100 Received: from janus.isnogud.escape.de (localhost [127.0.0.1]) by janus.isnogud.escape.de (8.13.8/8.13.8) with ESMTP id r05Br0sh015271 for ; Sat, 5 Jan 2013 12:53:00 +0100 Received: (from urs@localhost) by janus.isnogud.escape.de (8.13.8/8.13.8/Submit) id r05Br0R1015268; Sat, 5 Jan 2013 12:53:00 +0100 X-Authentication-Warning: janus.isnogud.escape.de: urs set sender to urs@isnogud.escape.de using -f To: bug-coreutils@gnu.org Subject: tr does not work with UTF-8 locales From: Urs Thuermann Date: 05 Jan 2013 12:53:00 +0100 Message-ID: Lines: 44 User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by oker.escape.de id r05Bt3uE024607 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 05 Jan 2013 12:27:20 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) The tr utility from coreutils-8.20 does not handle multi-byte characters in UTF-8 correctly. It seems the arguments and standard input are read byte-by-byte instead of character-by-character. Here are two examples, using the following UTF-8 characters (which are also available in latin1, since this is what my mail software still uses): =E4 (c3 a4), =F6 (c3 b6), =FC(c3 bc), =BC (c2 bc), =BD (c2 bd) 1. A call to tr -d =FC does not delete that two byte sequence from the input but deletes any occurence of c3 or bc: urs@bit:~/coreutils-8.20$ locale LANG=3DC.UTF-8 LANGUAGE=3D LC_CTYPE=3D"C.UTF-8" LC_NUMERIC=3D"C.UTF-8" LC_TIME=3D"C.UTF-8" LC_COLLATE=3D"C.UTF-8" LC_MONETARY=3D"C.UTF-8" LC_MESSAGES=3D"C.UTF-8" LC_PAPER=3D"C.UTF-8" LC_NAME=3D"C.UTF-8" LC_ADDRESS=3D"C.UTF-8" LC_TELEPHONE=3D"C.UTF-8" LC_MEASUREMENT=3D"C.UTF-8" LC_IDENTIFICATION=3D"C.UTF-8" LC_ALL=3D urs@bit:~/coreutils-8.20$ echo =E4=F6=FC=BC|od -tx1 0000000 c3 a4 c3 b6 c3 bc c2 bc 0a 0000011 urs@bit:~/coreutils-8.20$ echo =E4=F6=FC=BC|tr -d =FC|od -tx1 0000000 a4 b6 c2 0a 0000004 2. Replacing the single character =FC (c3 bc) by the single character =BD (c2 bd) does instead replace each c3 by c2 and each bc by bd: urs@bit:~/coreutils-8.20$ echo =E4=F6=FC=BC|tr =FC =BD|od -tx1 0000000 c2 a4 c2 b6 c2 bd c2 bd 0a 0000011 urs From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 06 07:23:25 2013 Received: (at 13362) by debbugs.gnu.org; 6 Jan 2013 12:23:25 +0000 Received: from localhost ([127.0.0.1]:47127 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TrpG8-0005xq-7T for submit@debbugs.gnu.org; Sun, 06 Jan 2013 07:23:24 -0500 Received: from mx1.redhat.com ([209.132.183.28]:20124) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TrpG4-0005xb-KS; Sun, 06 Jan 2013 07:23:21 -0500 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r06CN0bp025753 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sun, 6 Jan 2013 07:23:01 -0500 Received: from [10.36.116.39] (ovpn-116-39.ams2.redhat.com [10.36.116.39]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r06CMvfH030424 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sun, 6 Jan 2013 07:22:59 -0500 Message-ID: <50E96CA0.4030802@draigBrady.com> Date: Sun, 06 Jan 2013 12:22:56 +0000 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 MIME-Version: 1.0 To: Urs Thuermann Subject: Re: bug#13362: tr does not work with UTF-8 locales References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r06CN0bp025753 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: 13362 Cc: 13362@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) forcemerge 13362 9365 thanks On 01/05/2013 11:53 AM, Urs Thuermann wrote: > The tr utility from coreutils-8.20 does not handle multi-byte > characters in UTF-8 correctly. It seems the arguments and standard > input are read byte-by-byte instead of character-by-character. We all agree that this is an issue. Someone just needs to get the time to implement it. thanks, P=E1draig. From debbugs-submit-bounces@debbugs.gnu.org Fri Jun 27 13:06:58 2014 Received: (at 13362) by debbugs.gnu.org; 27 Jun 2014 17:06:58 +0000 Received: from localhost ([127.0.0.1]:35201 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1X0Zbx-0004fg-Fi for submit@debbugs.gnu.org; Fri, 27 Jun 2014 13:06:58 -0400 Received: from mout.gmx.com ([74.208.4.200]:56489) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1X0ZWo-0004VL-2y for 13362@debbugs.gnu.org; Fri, 27 Jun 2014 13:01:39 -0400 Received: from tp.localnet ([31.221.190.32]) by mail.gmx.com (mrgmxus001) with ESMTPSA (Nemesis) id 0LsTQU-1WXsac0FZo-011y6V for <13362@debbugs.gnu.org>; Fri, 27 Jun 2014 19:01:27 +0200 From: Ganton To: 13362@debbugs.gnu.org Subject: GNU bug report logs - #13362 tr does not work with UTF-8 locales Date: Fri, 27 Jun 2014 19:01:14 +0200 User-Agent: KMail/1.13.7 (Linux/3.13.0-29-lowlatency; KDE/4.13.1; x86_64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201406271901.14362.kubry@gmx.com> X-Provags-ID: V03:K0:hSwShdd3h2O0KHYsv7jfc1s7kGh1+0QMoxkaVKynaSaxcLFeAJM R3R1ld+cZmYKw8DGxPibbWJHsp0+Q+dF7lGQHpmymdUX8icmu1fSxr4ABOBV+vnLiDstuBX Zyhp72PT2zm5bS2zgf6ilUypr1E1GFm+rK20ibdst6xFQPZqJmqotr9oDoHmwsmJ3sVlkfb +SzwFdYJprgdwPq2JQy3A== X-Spam-Score: 1.7 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Dear sirs: This bugs is causing errors since many years ago (at least twelve (!) [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861]), and let's face it, if we don't change the point of view it will never get solved. Meanwhile, the effects of this bug will keep on damaging the works of Linux users, and our reputation. [...] Content analysis details: (1.7 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (kubry[at]gmx.com) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [74.208.4.200 listed in list.dnswl.org] -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -0.0 SPF_PASS SPF: sender matches SPF record 1.7 DEAR_SOMETHING BODY: Contains 'Dear (something)' X-Debbugs-Envelope-To: 13362 X-Mailman-Approved-At: Fri, 27 Jun 2014 13:06:50 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.7 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Dear sirs: This bugs is causing errors since many years ago (at least twelve (!) [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861]), and let's face it, if we don't change the point of view it will never get solved. Meanwhile, the effects of this bug will keep on damaging the works of Linux users, and our reputation. [...] Content analysis details: (1.7 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [74.208.4.200 listed in list.dnswl.org] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (kubry[at]gmx.com) -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -0.0 SPF_PASS SPF: sender matches SPF record 1.7 DEAR_SOMETHING BODY: Contains 'Dear (something)' Dear sirs: This bugs is causing errors since many years ago (at least twelve (!) [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861]), and let's face it, if we don't change the point of view it will never get solved. Meanwhile, the effects of this bug will keep on damaging the works of Linux users, and our reputation. sed can work with utf-8 correctly. What about asking help from sed developers? sed developers could even refactor the tr code so that sed code could be used, so at least this bug would not keep on causing errors to Linux users. Moreover, sed developers may find a better solution than that one. Thank you. From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 15 10:06:07 2018 Received: (at control) by debbugs.gnu.org; 15 Oct 2018 14:06:07 +0000 Received: from localhost ([127.0.0.1]:50838 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gC3VX-0006u9-Ec for submit@debbugs.gnu.org; Mon, 15 Oct 2018 10:06:07 -0400 Received: from mail-pf1-f181.google.com ([209.85.210.181]:45055) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gC3VV-0006tg-UX for control@debbugs.gnu.org; Mon, 15 Oct 2018 10:06:06 -0400 Received: by mail-pf1-f181.google.com with SMTP id r9-v6so9729297pff.11 for ; Mon, 15 Oct 2018 07:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=tslEaETexaL5Plma/unGTsX6EP4DKD4xeOJEPvCMxto=; b=IgYFBuphZeO6ufK2R0ThohvlgQjuhoNFWS1LlcCSo1fBUKysCs/zlabxiMQtbAEcv9 5lGZ2r6pLOmV6xqrPS+qDt2TeNf9PdqM87PuEWn/4x620EkidrGhTUDmK8QpzgVQzVAQ EbQlLs3N5jp0ConLqnjfIhq7aDCn8Xl50ynhkCdy0fANLzrY2EvRmD3m7uMjb5jb+xqH xS/hKhgbCyjDaI3jvxizDo6EABMqsRi+3b0sr7dxdtRTv6KUKh7gQ5PU0nHQE8YeQONT P9HPFS0vFx2QnpJxGkZmXHf+3XGL8HKXvVuDTlSNkqF/qZTLVqCRMbMCWIvHt6+ZEs5y n3hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=tslEaETexaL5Plma/unGTsX6EP4DKD4xeOJEPvCMxto=; b=BN3r7OXN8tmVFFClh/JaAHUbBcnAdSXiJ67IUn+im2fR1JlCkyNstmy7CewveTl3rI vnSXQ19OdA6Qp1kJFBuaGcoNBRzwMYw41tD8vZGNs6p+KVkRqseq9FvSPMt0jSjTDPMX mTl7/OUUicxSa42ymRGVWKwy+eldyLKFF2WgL6mhMSZZXaqgQHErZbsyshfA9Tm1AP9n m5S7nzWYoRABBjQeUfpMZUOTVocTfC2NLLWQ7t3voQlPJH4k+FeCZC+MqQN/p+8OPxzU kFK3LSse0N4xr5hs4MNgeUfpxgSUlJWXlkBAjD74LRDzK9UzaXZnfGEpez1iMWr9MNLu oS0A== X-Gm-Message-State: ABuFfogF5FahDcLajf6++HIxsfi3VRUE8JU/09QQUb42o/IqgSydVRRq 3Vo6uKUbR8M+YV87LQ8Js2Dg6zzjDqI= X-Google-Smtp-Source: ACcGV630bCAJkl8c383Nvxde4mlHBbauHW61q0zd37Y2dEOjIhXh0u6sXrtKgN3rwQfCYT0iTqNkyg== X-Received: by 2002:a62:67c3:: with SMTP id t64-v6mr10225425pfj.76.1539612359472; Mon, 15 Oct 2018 07:05:59 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id h77-v6sm21227916pfh.13.2018.10.15.07.05.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Oct 2018 07:05:57 -0700 (PDT) To: control@debbugs.gnu.org From: Assaf Gordon Message-ID: Date: Mon, 15 Oct 2018 08:05:56 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: severity 9365 wishlist retitle 9365 multibyte: tr: TR operates on bytes, not characters retitle 9446 cp: acl preservation problem on FreeBSD 8.1 severity 9472 wishlist [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.210.181 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.210.181 listed in list.dnswl.org] 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) severity 9365 wishlist retitle 9365 multibyte: tr: TR operates on bytes, not characters retitle 9446 cp: acl preservation problem on FreeBSD 8.1 severity 9472 wishlist