GNU bug report logs -
#20678
new bug that Paul "asked" for... grep -P aborts on non-utf8 input.
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20678 in the body.
You can then email your comments to 20678 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#20678
; Package
coreutils
.
(Wed, 27 May 2015 21:42:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"L. A. Walsh" <coreutils <at> tlinx.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 27 May 2015 21:42:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
(skip to end if you don't care to read how I found this
mess)...
Paul Eggert wrote:
> Linda Walsh wrote:
>
>> I had one file that it bailed on
>> saying it has an invalid UTF-8 encoding -- but the line was
>> recursive starting from '.' -- and it didn't name the file
>
> That's pretty vague. Can you reproduce that problem? I don't observe
> it:
----
I'm not quite *sure* how to tell someone else to reproduce this, but
I can pretty reliably now some output from a checker....:
*** file = libvtkUtilitiesPythonInitializer-pv4.2.so.1
grep: invalid UTF-8 byte sequence in input
-----
*** file = libvtkPVClientServerCoreCore-pv4.2.so.1
grep: invalid UTF-8 byte sequence in input
-----
*** file = libsystemd.so.0
grep: invalid UTF-8 byte sequence in input
-----
*** file = libvtkParallelCore-pv4.2.so.1
grep: invalid UTF-8 byte sequence in input
-----
Now before you think I'm too daft, the code that produces those
messages is in perl and is:
for my $k (@sorted_missing) {
P "*** file = %s", $k;
open(my $gh, "grep -rP '/$k' /home/rpms/13.2|");
while (<$gh>) {
print
}
P "-----";
}
Those files are files that came up "missing" as pre-reqs.
in /home/rpms/...., I have the *file listings* of each of
the rpms, created in the same structure as in the distro, so
a file under that dir /home/rpms/13.2.. This is why I had
a problem finding it:
Ishtar:rpms/13.2/repo/oss/suse> file -bi x86_64/*>/tmp/x86files.txt
Ishtar:rpms/13.2/repo/oss/suse> sort </tmp/x86files.txt |uniq -c
2 text/plain; charset=iso-8859-1
13269 text/plain; charset=us-ascii
2 text/plain; charset=utf-8
--- I'd say it's likely 1-2 files out of 13274 files that could
have the problem. Yeah, I run into alot of needles in haystacks..
but trying to find the needle... just generating the file of types:
> time file -i x86_64/*>/tmp/fullx86files.txt
27.71sec 27.07usr 0.63sys (99.99% cpu)
Then grep helps!
Ishtar:rpms/13.2/repo/oss/suse> grep iso-88 /tmp/fullx86files.txt
x86_64/aspell-is-0.51.10-46.1.2.x86_64.rpm:text/plain; charset=iso-8859-1
x86_64/aspell-nb-0.50.10-46.1.2.x86_64.rpm:text/plain; charset=iso-8859-1
---
Ishtar:rpms/13.2/repo/oss/suse> more
x86_64/aspell-is-0.51.10-46.1.2.x86_64.rpm
/usr/lib64/aspell-0.60/icelandic.alias
/usr/lib64/aspell-0.60/is.dat
/usr/lib64/aspell-0.60/is.multi
/usr/lib64/aspell-0.60/is.rws
/usr/lib64/aspell-0.60/is_phonet.dat
/usr/lib64/aspell-0.60/355slenska.alias <<-- the 355 was in inverse color
/usr/share/doc/packages/aspell-is
/usr/share/doc/packages/aspell-is/COPYING
/usr/share/doc/packages/aspell-is/Copyright
/usr/share/doc/packages/aspell-is/README
----
Same w/the other file (had this 1 'violation':
/usr/lib64/aspell-0.60/bokmal.alias
/usr/lib64/aspell-0.60/bokm345l.alias <-3
So those are 'octal' code points (using a little calc prog):
> pcalc
pcalc V0.1.8: Type 'constants' to see constants
(1)> 0355
= 237 (0x00ed) "í"
(2)> 0345
= 229 (0x00e5) "å"
-------------------------------------------------------------------------------
So the 1st part of the bug is the message w/no filename.
the 2nd part of the bug is this: (looking for '^nobody' in
"/etc/passwd" works, as shown in 1st example:
> grep -P '^nobody' /etc/passwd
nobody:x:65534:65533:(group Nobody):/var/lib/nobody:/bin/nologin
but the 'error' message aborts any further file searches:
---
> grep -P '^nobody' x86_64/aspell-is-0.51.10-46.1.2.x86_64.rpm /etc/passwd
grep: invalid UTF-8 byte sequence in input
----------------------------------------------------------
This is why I objected to '\000' being treated as a binary
file (and why I think it's bad grep can't look for that):
If one works with windows, it's far more likely
just to be in UTF-16 encoding.
-l
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#20678
; Package
coreutils
.
(Wed, 27 May 2015 22:05:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
On 05/27/2015 02:41 PM, L. A. Walsh wrote:
> *** file = libvtkUtilitiesPythonInitializer-pv4.2.so.1
> grep: invalid UTF-8 byte sequence in input
This looks like you're using an old version of libpcre, or of grep. I
can't reproduce the problem with the latest stable versions of both
(libpcre 8.37, grep-2.21). I can find similar problems if I use old
libpcre.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#20678
; Package
coreutils
.
(Wed, 27 May 2015 22:25:02 GMT)
Full text and
rfc822 format available.
Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> On 05/27/2015 02:41 PM, L. A. Walsh wrote:
>> *** file = libvtkUtilitiesPythonInitializer-pv4.2.so.1
>> grep: invalid UTF-8 byte sequence in input
>
> This looks like you're using an old version of libpcre, or of grep. I
> can't reproduce the problem with the latest stable versions of both
> (libpcre 8.37, grep-2.21). I can find similar problems if I use old
> libpcre.
---
ok... ARG -- I just installed the new version of grep from
my distro (suse13.2) -- grep-2.20-2.4.1.x86_64
I think they'll be out with a new distro release in about
a year...(yes, I can probably build my own...like I have
to with a growing body of Software) -- something that has
gotten me in trouble with my distro at times when I've caught
them locking different pieces of software to specific
libraries (not >== xxx but "==")... grrr...I could acknowledge
their point that most people wouldn't bother rebuilding
all the perl modules if they upgraded perl... but that's
not *everyone*!...sigh.
coreutils isn't as stable as it used to be (not entirely the
CU-devel team either: I've caught suse's hand in 1-2)...
Just ran into problems in their new gvim & sudo -- I think
the sudo prob is the sudo-dev team...but the gvim I filed
a bug on in previous version... guess it didn't get fixed.
Filing bugs more often than not is a big waste of time.
*grump* *grump*...
;-)
bug closed, send any further explanations to
20678 <at> debbugs.gnu.org and "L. A. Walsh" <coreutils <at> tlinx.org>
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Wed, 27 May 2015 22:45:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#20678
; Package
coreutils
.
(Thu, 28 May 2015 07:18:03 GMT)
Full text and
rfc822 format available.
Message #16 received at 20678 <at> debbugs.gnu.org (full text, mbox):
On 05/28/2015 12:24 AM, Linda Walsh wrote:
> ok... ARG -- I just installed the new version of grep from
> my distro (suse13.2) -- grep-2.20-2.4.1.x86_64
>
> I think they'll be out with a new distro release in about
> a year...(yes, I can probably build my own...like I have
> to with a growing body of Software)
This is openSUSE specific.
When you've built your own version with a patch for a problem,
nothing prevents you from simply creating a submit request for
that patch on OBS to "Base:System/grep", and maybe even creating
a maintenance request for "openSUSE:13.2/grep". Get involved.
Have a nice day,
Berny
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#20678
; Package
coreutils
.
(Thu, 28 May 2015 13:18:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 20678 <at> debbugs.gnu.org (full text, mbox):
Bernhard Voelker wrote:
> On 05/28/2015 12:24 AM, Linda Walsh wrote:
>> ok... ARG -- I just installed the new version of grep from
>> my distro (suse13.2) -- grep-2.20-2.4.1.x86_64
>>
>> I think they'll be out with a new distro release in about
>> a year...(yes, I can probably build my own...like I have
>> to with a growing body of Software)
>
> This is openSUSE specific.
> When you've built your own version with a patch for a problem,
> nothing prevents you from simply creating a submit request for
> that patch on OBS to "Base:System/grep", and maybe even creating
> a maintenance request for "openSUSE:13.2/grep". Get involved.
----
Main thing my patch is restoring functionality of 'rm'
to allow "rm -fr .", I'm not daft enough to try to sneak that in
as a default. Maybe in a different command, maybe as a non-default,
but I'm anything but duplicitous (unfortunately).
I _have_ always thought that a shorthand combination of rd and rm,
might be nice -- maybe 'r'... Of course it would only work like rmdir
on empty dirs unless they specify the "-r" flag so it could remove contents
first. And of course it would pay attention to the posix rule about not
trying to delete '.' after it finished its' depth first traversal...
But no one else seems to really care that much, so I'm not sure how much
effort I want to put into it to package something like that up. But
it has entered my mind...
Cheers,
Lina
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#20678
; Package
coreutils
.
(Thu, 28 May 2015 13:30:08 GMT)
Full text and
rfc822 format available.
Message #22 received at 20678 <at> debbugs.gnu.org (full text, mbox):
On 05/28/2015 03:17 PM, Linda Walsh wrote:
> Bernhard Voelker wrote:
>> On 05/28/2015 12:24 AM, Linda Walsh wrote:
>>> ok... ARG -- I just installed the new version of grep from
>>> my distro (suse13.2) -- grep-2.20-2.4.1.x86_64
>>>
>>> I think they'll be out with a new distro release in about
>>> a year...(yes, I can probably build my own...like I have
>>> to with a growing body of Software)
>>
>> This is openSUSE specific.
>> When you've built your own version with a patch for a problem,
>> nothing prevents you from simply creating a submit request for
>> that patch on OBS to "Base:System/grep", and maybe even creating
>> a maintenance request for "openSUSE:13.2/grep". Get involved.
> ----
> Main thing my patch is restoring functionality of 'rm'
> to allow "rm -fr ." [...]
stop, 'rm -rf .' is a completely different story - your bug report
was about 'grep -P' (for which the bug report is OT on the coreutils
mailing list btw.).
Having your own (probably non-generally wanted) patches in your own
OBS project is perfect. I was just talking about submitting the
patch for the non-utf8 issue (for which I personally didn't check
whether it is already included downstreams).
Have a nice day,
Berny
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 26 Jun 2015 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 years and 52 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.