From unknown Tue Aug 19 14:47:54 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56351: LC_CTYPE=C.UTF-8 causes an matching error on Sed Resent-From: KIM Taeyeob Original-Sender: "Debbugs-submit" Resent-CC: bug-sed@gnu.org Resent-Date: Sat, 02 Jul 2022 09:30:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 56351 X-GNU-PR-Package: sed X-GNU-PR-Keywords: To: 56351@debbugs.gnu.org X-Debbugs-Original-To: bug-sed@gnu.org Reply-To: git@taeyeob.kim Received: via spool by submit@debbugs.gnu.org id=B.16567541477740 (code B ref -1); Sat, 02 Jul 2022 09:30:03 +0000 Received: (at submit) by debbugs.gnu.org; 2 Jul 2022 09:29:07 +0000 Received: from localhost ([127.0.0.1]:39786 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7ZR5-00020f-D2 for submit@debbugs.gnu.org; Sat, 02 Jul 2022 05:29:07 -0400 Received: from lists.gnu.org ([209.51.188.17]:39378) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7VHr-0000qV-NP for submit@debbugs.gnu.org; Sat, 02 Jul 2022 01:03:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55978) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7VHr-0005FV-Ha for bug-sed@gnu.org; Sat, 02 Jul 2022 01:03:19 -0400 Received: from mail.vielbein.com ([141.164.61.112]:60872) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7VHm-0002fb-Mv for bug-sed@gnu.org; Sat, 02 Jul 2022 01:03:19 -0400 Received: from authenticated-user (PRIMARY_HOSTNAME [PUBLIC_IP]) by mail.vielbein.com (Postfix) with ESMTPA id 83FF13E7826 for ; Sat, 2 Jul 2022 05:03:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=taeyeob.kim; s=dkim; t=1656738190; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=590d77qWq8aAN+qZAbfUFiHEc6sDGemGBx1cKcEm0SM=; b=heu+qMD0El3AUt588fssktgwvYzzmZF1VU4GtZNi1E21Yu4es6iBxcnM/lE9rvHGLjGVsg sklYsA9j642szUWDsNLc8gKmA30iIr6jc+sq3AqTXqgeVJqSZNtgGJy4a6qTamAnUkxLHK 6LEO14GVKu/zNp1bORVNVaEOl2QjB4A= MIME-Version: 1.0 Date: Sat, 02 Jul 2022 14:03:10 +0900 From: KIM Taeyeob Mail-Reply-To: git@taeyeob.kim Message-ID: <070f213d3b146ae5585bdb6c800cfb2d@taeyeob.kim> X-Sender: git@taeyeob.kim Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=taeyeob.kim; s=dkim; t=1656738190; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=590d77qWq8aAN+qZAbfUFiHEc6sDGemGBx1cKcEm0SM=; b=JPTOR7u6sLtPJOdX2oh0/ItlQMdUzImbqSQwoJO6Zi8NNot0+VWeKwOPTi/dyQWGQJ8/si 4g0fCM/N2Fgq95RSRghFCqRAIGQrj22So7WbPrqfIF4ecoDyG5LOZlCJNNQ3ZW2RPWFLCZ VvdewWoSuiNolYLyXPXqweOpZX67IZo= ARC-Seal: i=1; s=dkim; d=taeyeob.kim; t=1656738190; a=rsa-sha256; cv=none; b=YN//pGHgQqUHEHpnAxVNH5khtIQRbBoNthCEeCnIShLKUY68/S69UQSQNVlfG8bojpe19z T7rW7n8LBWv8xFApdMqVTM5JzLDjVPdlEYwrfZpXei7h5gBpLMOm7hMS9VL347SO2hwi+/ /JBjvXrzAP2zI0Ha1rvuisQeOsr7+rA= ARC-Authentication-Results: i=1; mail.vielbein.com; auth=pass smtp.auth=i@taeyeob.kim smtp.mailfrom=git@taeyeob.kim Authentication-Results: mail.vielbein.com; auth=pass smtp.auth=i@taeyeob.kim smtp.mailfrom=git@taeyeob.kim X-Spamd-Bar: / Received-SPF: pass client-ip=141.164.61.112; envelope-from=git@taeyeob.kim; helo=mail.vielbein.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, UNPARSEABLE_RELAY=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Mailman-Approved-At: Sat, 02 Jul 2022 05:29:04 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) Sed (and also Grep) cannot match a certain range of Korean characters when it operates under LC_CTYPE=C.UTF-8 (and whatever language environment with UTF-8 encoding including en_US.UTF-8, ko_KR.UTF-8, or ja_JP.UTF-8 etc.) reproducing the bug on Sed: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | sed -e 's/./a/' a <-- matched and replaced without an issue $ echo 퐀 | sed -e 's/./a/' 퐀 <-- FAILED to match so it doesn't replace In detail, a character that is in the range [가-폿] (~) is matched without any issue but a character in the range [퐀-힣] (~) CANNOT be matched but it IS SUPPOSED TO be matched. Grep has the same issue with the period regex too. reproducing the bug on Grep: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | grep . 폿 <-- matched successfully $ echo 퐀 | grep . $ <-- failed to match I think it is related with or on Glibc, but I couldn't find way to reproduce the bug with those, so alternatively, I report on Sed instead. I also report this issue on the bug-grep list too. From unknown Tue Aug 19 14:47:54 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: git@taeyeob.kim Subject: bug#56351: closed (LC_CTYPE=C.UTF-8 causes an matching error on Sed) Message-ID: References: <070f213d3b146ae5585bdb6c800cfb2d@taeyeob.kim> X-Gnu-PR-Message: they-closed 56351 X-Gnu-PR-Package: sed Reply-To: 56351@debbugs.gnu.org Date: Sat, 02 Jul 2022 22:58:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1656802682-29775-1" This is a multi-part message in MIME format... ------------=_1656802682-29775-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #56351: LC_CTYPE=3DC.UTF-8 causes an matching error on Sed which was filed against the sed package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 56351@debbugs.gnu.org. --=20 56351: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D56351 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1656802682-29775-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 56351-done) by debbugs.gnu.org; 2 Jul 2022 22:57:29 +0000 Received: from localhost ([127.0.0.1]:43036 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7m3N-0007jW-Jw for submit@debbugs.gnu.org; Sat, 02 Jul 2022 18:57:29 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:37440) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7m3L-0007jI-Au for 56351-done@debbugs.gnu.org; Sat, 02 Jul 2022 18:57:28 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BBCB4160143; Sat, 2 Jul 2022 15:57:20 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id CeL_dp5pGSMd; Sat, 2 Jul 2022 15:57:19 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C1340160145; Sat, 2 Jul 2022 15:57:19 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id N7Cone4Y47xI; Sat, 2 Jul 2022 15:57:19 -0700 (PDT) Received: from [192.168.0.205] (ip72-206-2-24.fv.ks.cox.net [72.206.2.24]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 64F64160143; Sat, 2 Jul 2022 15:57:19 -0700 (PDT) Message-ID: Date: Sat, 2 Jul 2022 17:57:18 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: git@taeyeob.kim From: Paul Eggert Subject: LC_CTYPE=C.UTF-8 causes an matching error on Sed Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 56351-done Cc: 56351-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Thanks for reporting that. This bug was introduced in Sed 4.8. I propagated the Gnulib fix into the Sed development tree, here: https://git.savannah.gnu.org/cgit/sed.git/commit/?id=bfdc4d6ee4811c34d8756fcca7895f5d2eed6946 https://git.savannah.gnu.org/cgit/sed.git/commit/?id=49c90357b9a07fc78904660f68c2e6acd236da9d and the bug should be fixed in the next Sed release. ------------=_1656802682-29775-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 2 Jul 2022 09:29:07 +0000 Received: from localhost ([127.0.0.1]:39786 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7ZR5-00020f-D2 for submit@debbugs.gnu.org; Sat, 02 Jul 2022 05:29:07 -0400 Received: from lists.gnu.org ([209.51.188.17]:39378) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7VHr-0000qV-NP for submit@debbugs.gnu.org; Sat, 02 Jul 2022 01:03:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55978) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7VHr-0005FV-Ha for bug-sed@gnu.org; Sat, 02 Jul 2022 01:03:19 -0400 Received: from mail.vielbein.com ([141.164.61.112]:60872) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7VHm-0002fb-Mv for bug-sed@gnu.org; Sat, 02 Jul 2022 01:03:19 -0400 Received: from authenticated-user (PRIMARY_HOSTNAME [PUBLIC_IP]) by mail.vielbein.com (Postfix) with ESMTPA id 83FF13E7826 for ; Sat, 2 Jul 2022 05:03:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=taeyeob.kim; s=dkim; t=1656738190; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=590d77qWq8aAN+qZAbfUFiHEc6sDGemGBx1cKcEm0SM=; b=heu+qMD0El3AUt588fssktgwvYzzmZF1VU4GtZNi1E21Yu4es6iBxcnM/lE9rvHGLjGVsg sklYsA9j642szUWDsNLc8gKmA30iIr6jc+sq3AqTXqgeVJqSZNtgGJy4a6qTamAnUkxLHK 6LEO14GVKu/zNp1bORVNVaEOl2QjB4A= MIME-Version: 1.0 Date: Sat, 02 Jul 2022 14:03:10 +0900 From: KIM Taeyeob To: bug-sed@gnu.org Subject: LC_CTYPE=C.UTF-8 causes an matching error on Sed Mail-Reply-To: git@taeyeob.kim Message-ID: <070f213d3b146ae5585bdb6c800cfb2d@taeyeob.kim> X-Sender: git@taeyeob.kim Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=taeyeob.kim; s=dkim; t=1656738190; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=590d77qWq8aAN+qZAbfUFiHEc6sDGemGBx1cKcEm0SM=; b=JPTOR7u6sLtPJOdX2oh0/ItlQMdUzImbqSQwoJO6Zi8NNot0+VWeKwOPTi/dyQWGQJ8/si 4g0fCM/N2Fgq95RSRghFCqRAIGQrj22So7WbPrqfIF4ecoDyG5LOZlCJNNQ3ZW2RPWFLCZ VvdewWoSuiNolYLyXPXqweOpZX67IZo= ARC-Seal: i=1; s=dkim; d=taeyeob.kim; t=1656738190; a=rsa-sha256; cv=none; b=YN//pGHgQqUHEHpnAxVNH5khtIQRbBoNthCEeCnIShLKUY68/S69UQSQNVlfG8bojpe19z T7rW7n8LBWv8xFApdMqVTM5JzLDjVPdlEYwrfZpXei7h5gBpLMOm7hMS9VL347SO2hwi+/ /JBjvXrzAP2zI0Ha1rvuisQeOsr7+rA= ARC-Authentication-Results: i=1; mail.vielbein.com; auth=pass smtp.auth=i@taeyeob.kim smtp.mailfrom=git@taeyeob.kim Authentication-Results: mail.vielbein.com; auth=pass smtp.auth=i@taeyeob.kim smtp.mailfrom=git@taeyeob.kim X-Spamd-Bar: / Received-SPF: pass client-ip=141.164.61.112; envelope-from=git@taeyeob.kim; helo=mail.vielbein.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, UNPARSEABLE_RELAY=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 02 Jul 2022 05:29:04 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: git@taeyeob.kim Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) Sed (and also Grep) cannot match a certain range of Korean characters when it operates under LC_CTYPE=C.UTF-8 (and whatever language environment with UTF-8 encoding including en_US.UTF-8, ko_KR.UTF-8, or ja_JP.UTF-8 etc.) reproducing the bug on Sed: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | sed -e 's/./a/' a <-- matched and replaced without an issue $ echo 퐀 | sed -e 's/./a/' 퐀 <-- FAILED to match so it doesn't replace In detail, a character that is in the range [가-폿] (~) is matched without any issue but a character in the range [퐀-힣] (~) CANNOT be matched but it IS SUPPOSED TO be matched. Grep has the same issue with the period regex too. reproducing the bug on Grep: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | grep . 폿 <-- matched successfully $ echo 퐀 | grep . $ <-- failed to match I think it is related with or on Glibc, but I couldn't find way to reproduce the bug with those, so alternatively, I report on Sed instead. I also report this issue on the bug-grep list too. ------------=_1656802682-29775-1--