GNU bug report logs -
#22237
sed no longer removes high-ascii characters as it did formerly.
Previous Next
Reported by: Brian Tew <montanalag <at> gmail.com>
Date: Fri, 25 Dec 2015 18:52:02 UTC
Severity: normal
Tags: notabug
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22237 in the body.
You can then email your comments to 22237 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-sed <at> gnu.org
:
bug#22237
; Package
sed
.
(Fri, 25 Dec 2015 18:52:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Brian Tew <montanalag <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-sed <at> gnu.org
.
(Fri, 25 Dec 2015 18:52:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Well, sometimes it do and sometimes it don't.
Script started on Fri 25 Dec 2015 05:53:04 AM CS
~$ed sample
50
l
subject now that thanksgiving has come and gone\342\246$
q
~$
~$sed -i 's/[^a-z 0-9]//g' sample
~$ed sample
50
l
subject now that thanksgiving has come and gone\342\246$
q
~$
~$un[Ksed --version
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Jay Fenlason, Tom Lord, Ken Pizzini,
and Paolo Bonzini.
GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed <at> gnu.org>.
Be sure to include the word ``sed'' somewhere in the ``Subject:'' field.
~$exit
Script done on Fri 25 Dec 2015 05:59:12 AM CS
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Sat, 26 Dec 2015 21:20:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Brian Tew <montanalag <at> gmail.com>
:
bug acknowledged by developer.
(Sat, 26 Dec 2015 21:20:01 GMT)
Full text and
rfc822 format available.
Message #10 received at 22237-done <at> debbugs.gnu.org (full text, mbox):
On Fri, Dec 25, 2015 at 4:21 AM, Brian Tew <montanalag <at> gmail.com> wrote:
> Well, sometimes it do and sometimes it don't.
>
> Script started on Fri 25 Dec 2015 05:53:04 AM CS
> ~$ed sample
> 50
> l
> subject now that thanksgiving has come and gone\342\246$
> q
> ~$
> ~$sed -i 's/[^a-z 0-9]//g' sample
To remove all but the matched bytes, you probably want something like
this instead:
LC_ALL=C sed -i 's/[^[:alnum:] ]//'
Note I've done two things: used LC_ALL=C to override your default
locale (probably a UTF8 one), and to use [:alnum:] in place of that
nonportable a-z range and 0-9.
In general, with UTF8-based locales, a byte sequence like your
\342\246 will match no regular expression, since it is not a valid
UTF8 character.
What probably changed is that older versions of sed did not properly
handle multi-byte locales, or your other experience was using a
single-byte locale.
If you still think there is a problem with sed-4.22, please provide
more detail and I'll reopen this issue.
Added tag(s) notabug.
Request was from
Jim Meyering <jim <at> meyering.net>
to
control <at> debbugs.gnu.org
.
(Sun, 27 Dec 2015 18:25:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 25 Jan 2016 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 204 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.