From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 02 05:29:07 2022 Received: (at submit) by debbugs.gnu.org; 2 Jul 2022 09:29:07 +0000 Received: from localhost ([127.0.0.1]:39786 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7ZR5-00020f-D2 for submit@debbugs.gnu.org; Sat, 02 Jul 2022 05:29:07 -0400 Received: from lists.gnu.org ([209.51.188.17]:39378) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7VHr-0000qV-NP for submit@debbugs.gnu.org; Sat, 02 Jul 2022 01:03:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55978) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7VHr-0005FV-Ha for bug-sed@gnu.org; Sat, 02 Jul 2022 01:03:19 -0400 Received: from mail.vielbein.com ([141.164.61.112]:60872) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o7VHm-0002fb-Mv for bug-sed@gnu.org; Sat, 02 Jul 2022 01:03:19 -0400 Received: from authenticated-user (PRIMARY_HOSTNAME [PUBLIC_IP]) by mail.vielbein.com (Postfix) with ESMTPA id 83FF13E7826 for ; Sat, 2 Jul 2022 05:03:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=taeyeob.kim; s=dkim; t=1656738190; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=590d77qWq8aAN+qZAbfUFiHEc6sDGemGBx1cKcEm0SM=; b=heu+qMD0El3AUt588fssktgwvYzzmZF1VU4GtZNi1E21Yu4es6iBxcnM/lE9rvHGLjGVsg sklYsA9j642szUWDsNLc8gKmA30iIr6jc+sq3AqTXqgeVJqSZNtgGJy4a6qTamAnUkxLHK 6LEO14GVKu/zNp1bORVNVaEOl2QjB4A= MIME-Version: 1.0 Date: Sat, 02 Jul 2022 14:03:10 +0900 From: KIM Taeyeob To: bug-sed@gnu.org Subject: LC_CTYPE=C.UTF-8 causes an matching error on Sed Mail-Reply-To: git@taeyeob.kim Message-ID: <070f213d3b146ae5585bdb6c800cfb2d@taeyeob.kim> X-Sender: git@taeyeob.kim Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=taeyeob.kim; s=dkim; t=1656738190; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=590d77qWq8aAN+qZAbfUFiHEc6sDGemGBx1cKcEm0SM=; b=JPTOR7u6sLtPJOdX2oh0/ItlQMdUzImbqSQwoJO6Zi8NNot0+VWeKwOPTi/dyQWGQJ8/si 4g0fCM/N2Fgq95RSRghFCqRAIGQrj22So7WbPrqfIF4ecoDyG5LOZlCJNNQ3ZW2RPWFLCZ VvdewWoSuiNolYLyXPXqweOpZX67IZo= ARC-Seal: i=1; s=dkim; d=taeyeob.kim; t=1656738190; a=rsa-sha256; cv=none; b=YN//pGHgQqUHEHpnAxVNH5khtIQRbBoNthCEeCnIShLKUY68/S69UQSQNVlfG8bojpe19z T7rW7n8LBWv8xFApdMqVTM5JzLDjVPdlEYwrfZpXei7h5gBpLMOm7hMS9VL347SO2hwi+/ /JBjvXrzAP2zI0Ha1rvuisQeOsr7+rA= ARC-Authentication-Results: i=1; mail.vielbein.com; auth=pass smtp.auth=i@taeyeob.kim smtp.mailfrom=git@taeyeob.kim Authentication-Results: mail.vielbein.com; auth=pass smtp.auth=i@taeyeob.kim smtp.mailfrom=git@taeyeob.kim X-Spamd-Bar: / Received-SPF: pass client-ip=141.164.61.112; envelope-from=git@taeyeob.kim; helo=mail.vielbein.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, UNPARSEABLE_RELAY=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 02 Jul 2022 05:29:04 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: git@taeyeob.kim Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) Sed (and also Grep) cannot match a certain range of Korean characters when it operates under LC_CTYPE=C.UTF-8 (and whatever language environment with UTF-8 encoding including en_US.UTF-8, ko_KR.UTF-8, or ja_JP.UTF-8 etc.) reproducing the bug on Sed: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | sed -e 's/./a/' a <-- matched and replaced without an issue $ echo 퐀 | sed -e 's/./a/' 퐀 <-- FAILED to match so it doesn't replace In detail, a character that is in the range [가-폿] (~) is matched without any issue but a character in the range [퐀-힣] (~) CANNOT be matched but it IS SUPPOSED TO be matched. Grep has the same issue with the period regex too. reproducing the bug on Grep: $ export LC_CTYPE=C.UTF-8 $ echo 폿 | grep . 폿 <-- matched successfully $ echo 퐀 | grep . $ <-- failed to match I think it is related with or on Glibc, but I couldn't find way to reproduce the bug with those, so alternatively, I report on Sed instead. I also report this issue on the bug-grep list too. From debbugs-submit-bounces@debbugs.gnu.org Sat Jul 02 18:57:29 2022 Received: (at 56351-done) by debbugs.gnu.org; 2 Jul 2022 22:57:29 +0000 Received: from localhost ([127.0.0.1]:43036 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7m3N-0007jW-Jw for submit@debbugs.gnu.org; Sat, 02 Jul 2022 18:57:29 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:37440) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o7m3L-0007jI-Au for 56351-done@debbugs.gnu.org; Sat, 02 Jul 2022 18:57:28 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BBCB4160143; Sat, 2 Jul 2022 15:57:20 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id CeL_dp5pGSMd; Sat, 2 Jul 2022 15:57:19 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C1340160145; Sat, 2 Jul 2022 15:57:19 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id N7Cone4Y47xI; Sat, 2 Jul 2022 15:57:19 -0700 (PDT) Received: from [192.168.0.205] (ip72-206-2-24.fv.ks.cox.net [72.206.2.24]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 64F64160143; Sat, 2 Jul 2022 15:57:19 -0700 (PDT) Message-ID: Date: Sat, 2 Jul 2022 17:57:18 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: git@taeyeob.kim From: Paul Eggert Subject: LC_CTYPE=C.UTF-8 causes an matching error on Sed Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 56351-done Cc: 56351-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Thanks for reporting that. This bug was introduced in Sed 4.8. I propagated the Gnulib fix into the Sed development tree, here: https://git.savannah.gnu.org/cgit/sed.git/commit/?id=bfdc4d6ee4811c34d8756fcca7895f5d2eed6946 https://git.savannah.gnu.org/cgit/sed.git/commit/?id=49c90357b9a07fc78904660f68c2e6acd236da9d and the bug should be fixed in the next Sed release. From unknown Tue Aug 19 10:03:34 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 31 Jul 2022 11:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator