GNU bug report logs -
#56351
LC_CTYPE=C.UTF-8 causes an matching error on Sed
Previous Next
Reported by: git <at> taeyeob.kim
Date: Sat, 2 Jul 2022 09:30:03 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 56351 in the body.
You can then email your comments to 56351 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-sed <at> gnu.org
:
bug#56351
; Package
sed
.
(Sat, 02 Jul 2022 09:30:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
git <at> taeyeob.kim
:
New bug report received and forwarded. Copy sent to
bug-sed <at> gnu.org
.
(Sat, 02 Jul 2022 09:30:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Sed (and also Grep) cannot match a certain range of Korean characters
when it operates under LC_CTYPE=C.UTF-8 (and whatever language
environment with UTF-8 encoding including en_US.UTF-8, ko_KR.UTF-8, or
ja_JP.UTF-8 etc.)
reproducing the bug on Sed:
$ export LC_CTYPE=C.UTF-8
$ echo 폿 | sed -e 's/./a/'
a <-- matched and replaced without an issue
$ echo 퐀 | sed -e 's/./a/'
퐀 <-- FAILED to match so it doesn't replace
In detail, a character that is in the range [가-폿] (<UAC00>~<UD3FF>) is
matched without any issue but a character in the range [퐀-힣]
(<UD400>~<UD7A3>) CANNOT be matched but it IS SUPPOSED TO be matched.
Grep has the same issue with the period regex too.
reproducing the bug on Grep:
$ export LC_CTYPE=C.UTF-8
$ echo 폿 | grep .
폿 <-- matched successfully
$ echo 퐀 | grep .
$ <-- failed to match
I think it is related with <regex.h> or <iconv.h> on Glibc, but I
couldn't find way to reproduce the bug with those, so alternatively, I
report on Sed instead.
I also report this issue on the bug-grep list too.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sat, 02 Jul 2022 22:58:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
git <at> taeyeob.kim
:
bug acknowledged by developer.
(Sat, 02 Jul 2022 22:58:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 56351-done <at> debbugs.gnu.org (full text, mbox):
Thanks for reporting that. This bug was introduced in Sed 4.8. I
propagated the Gnulib fix into the Sed development tree, here:
https://git.savannah.gnu.org/cgit/sed.git/commit/?id=bfdc4d6ee4811c34d8756fcca7895f5d2eed6946
https://git.savannah.gnu.org/cgit/sed.git/commit/?id=49c90357b9a07fc78904660f68c2e6acd236da9d
and the bug should be fixed in the next Sed release.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 31 Jul 2022 11:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 20 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.