From unknown Tue Aug 19 05:11:06 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#56350 <56350@debbugs.gnu.org> To: bug#56350 <56350@debbugs.gnu.org> Subject: Status: UTF-8 LC_CTYPE bug esp when a certain range of Korean characters Reply-To: bug#56350 <56350@debbugs.gnu.org> Date: Tue, 19 Aug 2025 12:11:06 +0000 retitle 56350 UTF-8 LC_CTYPE bug esp when a certain range of Korean charact= ers reassign 56350 grep submitter 56350 git@taeyeob.kim severity 56350 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 02 05:29:07 2022 Received: (at submit) by debbugs.gnu.org; 2 Jul 2022 09:29:07 +0000 Received: from localhost ([127.0.0.1]:39784 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7ZR3-00020Y-Ra for submit@debbugs.gnu.org; Sat, 02 Jul 2022 05:29:07 -0400 Received: from lists.gnu.org ([209.51.188.17]:33100) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7U3n-00078V-JR for submit@debbugs.gnu.org; Fri, 01 Jul 2022 23:44:43 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45982) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7U3m-000237-CI for bug-grep@gnu.org; Fri, 01 Jul 2022 23:44:43 -0400 Received: from mail.vielbein.com ([141.164.61.112]:50146) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7U3j-0005Sf-Ls for bug-grep@gnu.org; Fri, 01 Jul 2022 23:44:41 -0400 Received: from authenticated-user (PRIMARY_HOSTNAME [PUBLIC_IP]) by mail.vielbein.com (Postfix) with ESMTPA id 36C7E3E7A67 for ; Sat, 2 Jul 2022 03:44:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=taeyeob.kim; s=dkim; t=1656733470; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=KHq4/pdAQYGzfNN6MxXnasHq/fbXzBcQC0ilzr5XVDc=; b=Jod2Kbc8vfNOlrQZ+8BtYX/0/SCm/nPYjHDpvr+HtctY//0iztGZCHw3g0AbRHU9vjPiG/ Cy2G+SsGDUBFukbbxLVLAjizauW78ttSX9Xp6SxbTfZSr/WN9ZL++vAQvYGOk66n/frwfR gbitCN6HcbZ2c+TYP/v+jSbrVWn4RLI= MIME-Version: 1.0 Date: Sat, 02 Jul 2022 12:44:29 +0900 From: KIM Taeyeob To: bug-grep@gnu.org Subject: UTF-8 LC_CTYPE bug esp when a certain range of Korean characters Mail-Reply-To: git@taeyeob.kim Message-ID: X-Sender: git@taeyeob.kim Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=taeyeob.kim; s=dkim; t=1656733470; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=KHq4/pdAQYGzfNN6MxXnasHq/fbXzBcQC0ilzr5XVDc=; b=eKjgedn/jdzai6hJECrm+CHV2BpLl+PeUr4fYYYspbR/edDgP+mo4vRXowlGxya8j5yYlZ OmQTiUROCL0R26377OA8yTXokPgxmbjnz8GnBLbTlcgpMV007s64R/1xMxo/+szPh0FVHz mtVOel4B7EnLsRvRUY+o+BltrCSbFH0= ARC-Seal: i=1; s=dkim; d=taeyeob.kim; t=1656733470; a=rsa-sha256; cv=none; b=HBoVgBSGBJ3zbt5TNUmDih86epHxCD1sMuEYo8rMo0LJEv4DAZBu08qOB4MNPSDo+NDWog VlCK6XZFfte7TZVmdJmSsKK6XURg/5rVgL0PTVCUj8vuxO/hG3DXMQcyb7ANbaSqHRJAdp zj/3pDs5bmSrmGvO6RSInUe/Lpbt4oE= ARC-Authentication-Results: i=1; mail.vielbein.com; auth=pass smtp.auth=i@taeyeob.kim smtp.mailfrom=git@taeyeob.kim Authentication-Results: mail.vielbein.com; auth=pass smtp.auth=i@taeyeob.kim smtp.mailfrom=git@taeyeob.kim X-Spamd-Bar: / Received-SPF: pass client-ip=141.164.61.112; envelope-from=git@taeyeob.kim; helo=mail.vielbein.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, UNPARSEABLE_RELAY=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 02 Jul 2022 05:29:04 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: git@taeyeob.kim Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) Grep (and also Sed) cannot match a certain range of Korean characters when it operates under LC_CTYPE=C.UTF-8 (and whatever language environment with UTF-8 encoding including en_US.UTF-8, ko_KR.UTF-8, or ja_JP.UTF-8 etc.) Reproduce the bug: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | grep . 폿 <-- a character that is in the range [가-폿] (~) is matched without any issue $ echo 퐀 | grep . $ <-- but a character in the range [퐀-힣] (~) CANNOT be matched but it IS SUPPOSED TO be matched. Sed has the same issue with the period regex too. The Example of Sed: $ export LC_CTYPE=C.UTF-8 $ echo "폿" | sed -e 's/./a/' a <-- matched and replaced without an issue $ echo "퐀" | sed -e 's/./a/' 퐀 <-- FAILED to match so it doesn't replace I think it is related with or on Glibc, but I couldn't find way to reproduce the bug with those, so alternatively, I report on Grep instead. From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 02 17:28:54 2022 Received: (at 56350-done) by debbugs.gnu.org; 2 Jul 2022 21:28:54 +0000 Received: from localhost ([127.0.0.1]:42971 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7kfd-0005Hs-Tq for submit@debbugs.gnu.org; Sat, 02 Jul 2022 17:28:54 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:59748) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7kfZ-0005HZ-J3; Sat, 02 Jul 2022 17:28:52 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CBC88160143; Sat, 2 Jul 2022 14:28:43 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id khXGx1qB9Hys; Sat, 2 Jul 2022 14:28:42 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 31CB3160145; Sat, 2 Jul 2022 14:28:42 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 27yEds6_iLhz; Sat, 2 Jul 2022 14:28:41 -0700 (PDT) Received: from [192.168.0.205] (ip72-206-2-24.fv.ks.cox.net [72.206.2.24]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 954F8160143; Sat, 2 Jul 2022 14:28:41 -0700 (PDT) Message-ID: <6dc73457-0b41-ce63-c4c1-9c329848c766@cs.ucla.edu> Date: Sat, 2 Jul 2022 16:28:40 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: bug#56350: UTF-8 LC_CTYPE bug esp when a certain range of Korean characters Content-Language: en-US To: git@taeyeob.kim, =?UTF-8?B?6rmA7YOc7Je9?= References: From: Paul Eggert In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 56350-done Cc: 56350-done@debbugs.gnu.org, 56352-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Thanks, that's a Gnulib bug that was fixed here: https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=b19a10775e54f8ed17e3a8c08a72d261d8c26244 This has been propagated to GNU Grep and the fix should appear in the next Grep release. I plan to reply separately about GNU Sed. From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 02 17:35:33 2022 Received: (at control) by debbugs.gnu.org; 2 Jul 2022 21:35:33 +0000 Received: from localhost ([127.0.0.1]:42986 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7km4-0005Ty-R7 for submit@debbugs.gnu.org; Sat, 02 Jul 2022 17:35:32 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:60430) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7km3-0005Tk-C8 for control@debbugs.gnu.org; Sat, 02 Jul 2022 17:35:31 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id EE112160143 for ; Sat, 2 Jul 2022 14:35:25 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id cabCBq82CgI5 for ; Sat, 2 Jul 2022 14:35:25 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5DEB5160145 for ; Sat, 2 Jul 2022 14:35:25 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 2ky6qqoGunf7 for ; Sat, 2 Jul 2022 14:35:25 -0700 (PDT) Received: from [192.168.0.205] (ip72-206-2-24.fv.ks.cox.net [72.206.2.24]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 2040B160143 for ; Sat, 2 Jul 2022 14:35:25 -0700 (PDT) Message-ID: Date: Sat, 2 Jul 2022 16:35:24 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: control@debbugs.gnu.org From: Paul Eggert Subject: 56350 and 56352 are the same Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) merge 56350 56352 From unknown Tue Aug 19 05:11:06 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 31 Jul 2022 11:24:07 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator