GNU bug report logs - #10355
Add an option to {md5,sha*} to ignore directories

Previous Next

Package: coreutils;

Reported by: "Gilles Espinasse" <g.esp <at> free.fr>

Date: Fri, 23 Dec 2011 13:47:02 UTC

Severity: wishlist

Tags: moreinfo, notabug, wontfix

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 10355 in the body.
You can then email your comments to 10355 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#10355; Package coreutils. (Fri, 23 Dec 2011 13:47:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Gilles Espinasse" <g.esp <at> free.fr>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 23 Dec 2011 13:47:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Gilles Espinasse" <g.esp <at> free.fr>
To: <bug-coreutils <at> gnu.org>
Subject: Add an option to {md5,sha*} to ignore directories
Date: Fri, 23 Dec 2011 14:45:10 +0100
I was using a way to check md5sum on a lot of file using
 for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
/$myfile >> $ALLFILES}.md5; fi; done

But this is slow, comparing with xargs md5sum way.
time (for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
/$myfile >> ${ALLFILES}.md5; fi; done)

real    0m26.907s
user    0m40.019s
sys     0m10.253s

This is faster using xargs md5sum.
time (sed -e '/.\/$/d' -e 's|^.|/&|g' ${ALLFILES} | xargs md5sum
>${ALLFILES}.md5)
md5sum: /etc/ipsec.d/cacerts: Is a directory
md5sum: /etc/ipsec.d/certs: Is a directory
md5sum: /etc/ipsec.d/crls: Is a directory
md5sum: /etc/ppp/chap-secrets: No such file or directory
md5sum: /etc/ppp/pap-secrets: No such file or directory
md5sum: /etc/squid/squid.conf: No such file or directory

real    0m1.176s
user    0m0.780s
sys     0m0.400s

That run mostly 30 times faster.
In the above example, I already skipped most of the directories in the list,
removing lines that end with / but not all directories in my list match on
that condition.

So the fast solution emit errors and end with status 123.
I know I could hide error messages and status error but that start to be
ugly.
sed -e'/.\/$/d' -e 's|^.|/&|g' ${ALLFILES} | xargs md5sum > ${ALLFILES}.md5
2>/dev/null || test $? -eq 123

Would it not be great to support an option in {md5,sha*} to ignore directory
error?
I may even be able to produce a patch if there is a real interest.

Gilles





Information forwarded to bug-coreutils <at> gnu.org:
bug#10355; Package coreutils. (Fri, 23 Dec 2011 15:06:01 GMT) Full text and rfc822 format available.

Message #8 received at 10355 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Gilles Espinasse <g.esp <at> free.fr>
Cc: 10355 <at> debbugs.gnu.org
Subject: Re: bug#10355: Add an option to {md5,sha*} to ignore directories
Date: Fri, 23 Dec 2011 16:02:43 +0100
Hi Gilles,

On 12/23/2011 02:45 PM, Gilles Espinasse wrote:
> I was using a way to check md5sum on a lot of file using
>   for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
> /$myfile>>  $ALLFILES}.md5; fi; done
>
> But this is slow, comparing with xargs md5sum way.
> time (for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
> /$myfile>>  ${ALLFILES}.md5; fi; done)
>
> real    0m26.907s
> user    0m40.019s
> sys     0m10.253s
>
> This is faster using xargs md5sum.
> time (sed -e '/.\/$/d' -e 's|^.|/&|g' ${ALLFILES} | xargs md5sum
>> ${ALLFILES}.md5)
> md5sum: /etc/ipsec.d/cacerts: Is a directory
> md5sum: /etc/ipsec.d/certs: Is a directory
> md5sum: /etc/ipsec.d/crls: Is a directory
> md5sum: /etc/ppp/chap-secrets: No such file or directory
> md5sum: /etc/ppp/pap-secrets: No such file or directory
> md5sum: /etc/squid/squid.conf: No such file or directory
>
> real    0m1.176s
> user    0m0.780s
> sys     0m0.400s
>
> That run mostly 30 times faster.
> In the above example, I already skipped most of the directories in the list,
> removing lines that end with / but not all directories in my list match on
> that condition.

How do you create the list of files to check?
You could use "find $DIR -type f" to list regular files only.

Erik




Information forwarded to bug-coreutils <at> gnu.org:
bug#10355; Package coreutils. (Fri, 23 Dec 2011 17:20:02 GMT) Full text and rfc822 format available.

Message #11 received at 10355 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Gilles Espinasse <g.esp <at> free.fr>, 10355 <at> debbugs.gnu.org
Subject: Re: bug#10355: Add an option to {md5,sha*} to ignore directories
Date: Fri, 23 Dec 2011 10:17:22 -0700
severity 10355 wishlist
tags 10355 + notabug wontfix moreinfo
thanks

Erik Auerswald wrote:
> Gilles Espinasse wrote:
> >I was using a way to check md5sum on a lot of file using
> >  for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
> >/$myfile>>  $ALLFILES}.md5; fi; done
>...
> You could use "find $DIR -type f" to list regular files only.

Yes.  Exactly.  The capability you ask for is already present.

Please try this:

  find . -type f -exec md5sum {} +

Replace '.' above with a directory if you wish it to find files in a
different directory.

Bob




Severity set to 'wishlist' from 'normal' Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Fri, 23 Dec 2011 17:20:02 GMT) Full text and rfc822 format available.

Added tag(s) notabug, moreinfo, and wontfix. Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Fri, 23 Dec 2011 17:20:02 GMT) Full text and rfc822 format available.

Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Fri, 23 Dec 2011 17:51:01 GMT) Full text and rfc822 format available.

Notification sent to "Gilles Espinasse" <g.esp <at> free.fr>:
bug acknowledged by developer. (Fri, 23 Dec 2011 17:51:02 GMT) Full text and rfc822 format available.

Message #20 received at 10355-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Gilles Espinasse <g.esp <at> free.fr>
Cc: 10355-done <at> debbugs.gnu.org
Subject: Re: bug#10355: Add an option to {md5,sha*} to ignore directories
Date: Fri, 23 Dec 2011 17:48:05 +0000
On 12/23/2011 01:45 PM, Gilles Espinasse wrote:
> I was using a way to check md5sum on a lot of file using
>  for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
> /$myfile >> $ALLFILES}.md5; fi; done
> 
> But this is slow, comparing with xargs md5sum way.
> time (for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
> /$myfile >> ${ALLFILES}.md5; fi; done)
> 
> real    0m26.907s
> user    0m40.019s
> sys     0m10.253s
> 
> This is faster using xargs md5sum.
> time (sed -e '/.\/$/d' -e 's|^.|/&|g' ${ALLFILES} | xargs md5sum
>> ${ALLFILES}.md5)
> md5sum: /etc/ipsec.d/cacerts: Is a directory
> md5sum: /etc/ipsec.d/certs: Is a directory
> md5sum: /etc/ipsec.d/crls: Is a directory
> md5sum: /etc/ppp/chap-secrets: No such file or directory
> md5sum: /etc/ppp/pap-secrets: No such file or directory
> md5sum: /etc/squid/squid.conf: No such file or directory
> 
> real    0m1.176s
> user    0m0.780s
> sys     0m0.400s
> 
> That run mostly 30 times faster.
> In the above example, I already skipped most of the directories in the list,
> removing lines that end with / but not all directories in my list match on
> that condition.
> 
> So the fast solution emit errors and end with status 123.
> I know I could hide error messages and status error but that start to be
> ugly.
> sed -e'/.\/$/d' -e 's|^.|/&|g' ${ALLFILES} | xargs md5sum > ${ALLFILES}.md5
> 2>/dev/null || test $? -eq 123
> 
> Would it not be great to support an option in {md5,sha*} to ignore directory
> error?
> I may even be able to produce a patch if there is a real interest.
> 
> Gilles

I don't think this is worthwhile TBH, as it is too unusual.
One can easily exclude dirs from the source.
Either trivially with find, or filtering like:

LANG=C xargs -d'\n' -r stat -L -c "%F:%n" < ${ALLFILES} | # decorate
sed '/^directory:/d; s/^[^:]*://' | # filter and undecorate
xargs -d'\n' md5sum # process

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#10355; Package coreutils. (Fri, 23 Dec 2011 17:55:02 GMT) Full text and rfc822 format available.

Message #23 received at 10355 <at> debbugs.gnu.org (full text, mbox):

From: "Gilles Espinasse" <g.esp <at> free.fr>
To: "Bob Proulx" <bob <at> proulx.com>,
	<10355 <at> debbugs.gnu.org>
Subject: Re: bug#10355: Add an option to {md5,sha*} to ignore directories
Date: Fri, 23 Dec 2011 18:53:27 +0100
----- Original Message ----- 
From: "Bob Proulx" <bob <at> proulx.com>
To: "Gilles Espinasse" <g.esp <at> free.fr>; <10355 <at> debbugs.gnu.org>
Sent: Friday, December 23, 2011 6:17 PM
Subject: Re: bug#10355: Add an option to {md5,sha*} to ignore directories


> severity 10355 wishlist
> tags 10355 + notabug wontfix moreinfo
> thanks
>
> Erik Auerswald wrote:
> > Gilles Espinasse wrote:
> > >I was using a way to check md5sum on a lot of file using
> > >  for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
> > >/$myfile>>  $ALLFILES}.md5; fi; done
> >...
> > You could use "find $DIR -type f" to list regular files only.
>
Thank for the suggestion.
ALLFILES is indirectly made using find $DIR, but I can't use -type f during
that find.
The primary usage of that list is to create a tar using --files-from and
when a directory is empty, you need to include the directory name directly.

> Yes.  Exactly.  The capability you ask for is already present.
>
> Please try this:
>
>   find . -type f -exec md5sum {} +
>
> Replace '.' above with a directory if you wish it to find files in a
> different directory.
>
> Bob

This does not work too in my case.
I didn't want to calculate md5 for each file found in my chroot.
I care only for a shorter list of files to be include in a tar.

The only change I find is derived from the slow version, but instead of
running md5sum each time, adding true file names to a list that md5sum will
only use at the end:
rm /tmp/ALLFILES*
time (for myfile in ${ALLFILES} | sed -e 's/^dev.*//' -e 's/^sys.*//'); do
if [ -f /${myfile} ]; then echo /$myfile >>/tmp/ALLFILES; fi; done; xargs
md5sum < /tmp/ALLFILES >/tmp/ALLFILES.md5)

real    0m1.967s
user    0m1.368s
sys     0m0.604s

This is approximatly 100% slower than the fast version but does not need
hiding errors (from directory message and program status). That's fast
enought for my need and divide by 5 the time from the first slow version.

Gilles





Information forwarded to bug-coreutils <at> gnu.org:
bug#10355; Package coreutils. (Fri, 23 Dec 2011 23:02:02 GMT) Full text and rfc822 format available.

Message #26 received at 10355 <at> debbugs.gnu.org (full text, mbox):

From: "Alan Curry" <pacman-cu <at> kosh.dhis.org>
To: bob <at> proulx.com (Bob Proulx)
Cc: 10355 <at> debbugs.gnu.org
Subject: Re: bug#10355: Add an option to {md5,sha*} to ignore directories
Date: Fri, 23 Dec 2011 17:58:51 -0500 (GMT+5)
Bob Proulx writes:
> 
> severity 10355 wishlist
> tags 10355 + notabug wontfix moreinfo
> thanks
> 
> Erik Auerswald wrote:
> > Gilles Espinasse wrote:
> > >I was using a way to check md5sum on a lot of file using
> > >  for myfile in `cat ${ALLFILES}`; do if [ -f /${myfile} ]; then md5sum
> > >/$myfile>>  $ALLFILES}.md5; fi; done
> >...
> > You could use "find $DIR -type f" to list regular files only.
> 
> Yes.  Exactly.  The capability you ask for is already present.

Do you suppose we can convince GNU grep's maintainer to follow this
philosphy?

$ mkdir d
$ touch d/foo
$ grep foo *
$

It opens and reads, gets EISDIR, and intentionally skips printing it. Grr.

But wait, there's a -d option with 3 alternatives for what to do with
directories! ...and none of choices is "just print the EISDIR so I'll know
if I accidentally grepped a directory".

-- 
Alan Curry




Information forwarded to bug-coreutils <at> gnu.org:
bug#10355; Package coreutils. (Sat, 24 Dec 2011 01:00:02 GMT) Full text and rfc822 format available.

Message #29 received at 10355 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: 10355 <at> debbugs.gnu.org
Subject: Re: bug#10355: Add an option to {md5,sha*} to ignore directories
Date: Fri, 23 Dec 2011 17:57:18 -0700
Alan Curry wrote:
> Do you suppose we can convince GNU grep's maintainer to follow this
> philosphy?

Too late.  GNU grep already has --recursive.  I think adding
--recursive to grep was a mistake.  It then requires most of 'find' to
be added to it too.  (--include*, --exclude*)

> $ mkdir d
> $ touch d/foo
> $ grep foo *
> $
> 
> It opens and reads, gets EISDIR, and intentionally skips printing it. Grr.

All silently.  For most cases I think your example would have been a a
case of programming error.  It would be better to make those cases noisy.

The above seems to be a bug since it violates the documented action of
'read' for directories.  It appears to be skipping by default.  Even
when --directories=read is specified.

> But wait, there's a -d option with 3 alternatives for what to do with
> directories! ...and none of choices is "just print the EISDIR so I'll know
> if I accidentally grepped a directory".

And the problems just go on and on.

Bob




Information forwarded to bug-coreutils <at> gnu.org:
bug#10355; Package coreutils. (Sat, 24 Dec 2011 10:45:02 GMT) Full text and rfc822 format available.

Message #32 received at 10355 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Alan Curry <pacman-cu <at> kosh.dhis.org>
Cc: 10355 <at> debbugs.gnu.org, Bob Proulx <bob <at> proulx.com>
Subject: Re: bug#10355: Add an option to {md5,sha*} to ignore directories
Date: Sat, 24 Dec 2011 02:42:13 -0800
On 12/23/11 14:58, Alan Curry wrote:
> Do you suppose we can convince GNU grep's maintainer to follow this
> philosphy?

We definitely should.  I have filed a bug report (with patch) at
<https://savannah.gnu.org/bugs/index.php?35169>.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 21 Jan 2012 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 158 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.