GNU bug report logs -
#22128
dirname: accept file list input
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22128 in the body.
You can then email your comments to 22128 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Thu, 10 Dec 2015 01:09:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Nellis, Kenneth" <Kenneth.Nellis <at> xerox.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 10 Dec 2015 01:09:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I frequently need to extract the `dirname's from a list of files,
so dirname should have an option to take its input from a
file, e.g.:
dirname -f <filename>
where <filename> could be "-" for stdin.
E.g., to get a list of directories that contain a specific
file:
find -name "xyz.dat" | dirname -f -
The same would be good for `basename' as well.
--Ken Nellis
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Thu, 10 Dec 2015 01:47:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 22128 <at> debbugs.gnu.org (full text, mbox):
tag 22128 notabug
close 22128
stop
On 09/12/15 17:31, Nellis, Kenneth wrote:
> I frequently need to extract the `dirname's from a list of files,
> so dirname should have an option to take its input from a
> file, e.g.:
>
> dirname -f <filename>
xargs dirname < filename
> where <filename> could be "-" for stdin.
>
> E.g., to get a list of directories that contain a specific
> file:
>
> find -name "xyz.dat" | dirname -f -
find -name "xyz.dat" -print0 | xargs -r0 dirname
> The same would be good for `basename' as well.
xargs basename -a < filename
thanks,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Thu, 10 Dec 2015 16:35:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 22128 <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady wrote:
> Nellis, Kenneth wrote:
> > E.g., to get a list of directories that contain a specific file:
> >
> > find -name "xyz.dat" | dirname -f -
>
> find -name "xyz.dat" -print0 | xargs -r0 dirname
Also if using GNU find can use GNU find's -printf operand and %h to
print the directory of the matching item. Not portable to non-gnu
systems.
find . -name xyz.dat -printf "%h\n"
Can generate null terminated string output for further xargs -0 use.
find . -name xyz.dat -printf "%h\0" | xargs -0 ...otherstuff...
Bob
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Thu, 10 Dec 2015 16:50:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 22128 <at> debbugs.gnu.org (full text, mbox):
Thanx. Hadn't yet discovered GNU find's -printf option.
Still, my -f suggestion would be easier to type,
but I welcome your alternatives.
-Ken
> -----Original Message-----
> From: Bob Proulx [mailto:bob <at> proulx.com]
> Sent: Thursday, December 10, 2015 11:34 AM
> To: Nellis, Kenneth; 22128 <at> debbugs.gnu.org
> Subject: Re: bug#22128: dirname enhancement
>
> Pádraig Brady wrote:
> > Nellis, Kenneth wrote:
> > > E.g., to get a list of directories that contain a specific file:
> > >
> > > find -name "xyz.dat" | dirname -f -
> >
> > find -name "xyz.dat" -print0 | xargs -r0 dirname
>
> Also if using GNU find can use GNU find's -printf operand and %h to
> print the directory of the matching item. Not portable to non-gnu
> systems.
>
> find . -name xyz.dat -printf "%h\n"
>
> Can generate null terminated string output for further xargs -0 use.
>
> find . -name xyz.dat -printf "%h\0" | xargs -0 ...otherstuff...
>
> Bob
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Thu, 10 Dec 2015 17:41:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 22128 <at> debbugs.gnu.org (full text, mbox):
Nellis, Kenneth wrote:
> Still, my -f suggestion would be easier to type,
> but I welcome your alternatives.
Here is the problem. You would like dirname to read a list from a
file. Someone else will want it to read a file list of files listing
files. Another will want to skip one header line. Another will want
to skip multiple header lines. Another will want the exact same
feature in basename too. Another will want file name modification so
that it can be used to rename directories. And on and on and on.
Trying to put every possible combination of feature into every utility
leads to unmanageable code bloat.
What do all of those have in common? They are all specific features
that are easily available by using the features of the operating
system. That is the entire point of a Unix-like operating system. It
already has all of the tools needed. You tell it what you want it to
do using those features. That is the way the operating system is
designed. Utilities such as dirname are simply small pieces in the
complete solution.
In this instance the first thing I thought of when I read your dirname
-f request was a loop.
while read dir; do dirname $dir; done < list
Pádraig suggested xargs which was even shorter.
xargs dirname < filename
Both of those directly do exactly what you had asked to do. The
technique works not only with dirname but with every other command on
the system too. A technique that works with everything is much better
than something that only works in one small place.
Want to get the basename instead?
while read dir; do basename $dir; done < list
Want to modify the result to add a suffix?
while read dir; do echo $dir.myaddedsuffix; done < list
Want to modify the name in some custom way?
while read dir; do echo $dir | sed 's/foo/bar/; done < list
Want a sorted unique list modified in some custom way?
while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u
The possibilities are endless and as they say limited only by your
imagination. Anything you can think of doing you can tell the system
to do it for you. Truly a marvelous thing to be so empowered.
Note that in order to be completely general and work with arbitrary
names that have embedded newlines then proper quoting is required and
the wisdom of today says always use null terminated strings. But if
you are using a file of names then I assume you are operating on a
restricted and sane set of characters so this won't matter to you.
I do that all of the time.
Bob
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Thu, 10 Dec 2015 17:47:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 22128 <at> debbugs.gnu.org (full text, mbox):
I got it. You don't like the idea. That's fine. Please close the ticket.
--Ken
> -----Original Message-----
> From: Bob Proulx [mailto:bob <at> proulx.com]
> Sent: Thursday, December 10, 2015 12:41 PM
> To: Nellis, Kenneth
> Cc: 22128 <at> debbugs.gnu.org
> Subject: Re: bug#22128: dirname enhancement
>
> Nellis, Kenneth wrote:
> > Still, my -f suggestion would be easier to type,
> > but I welcome your alternatives.
>
> Here is the problem. You would like dirname to read a list from a
> file. Someone else will want it to read a file list of files listing
> files. Another will want to skip one header line. Another will want
> to skip multiple header lines. Another will want the exact same
> feature in basename too. Another will want file name modification so
> that it can be used to rename directories. And on and on and on.
> Trying to put every possible combination of feature into every utility
> leads to unmanageable code bloat.
>
> What do all of those have in common? They are all specific features
> that are easily available by using the features of the operating
> system. That is the entire point of a Unix-like operating system. It
> already has all of the tools needed. You tell it what you want it to
> do using those features. That is the way the operating system is
> designed. Utilities such as dirname are simply small pieces in the
> complete solution.
>
> In this instance the first thing I thought of when I read your dirname
> -f request was a loop.
>
> while read dir; do dirname $dir; done < list
>
> Pádraig suggested xargs which was even shorter.
>
> xargs dirname < filename
>
> Both of those directly do exactly what you had asked to do. The
> technique works not only with dirname but with every other command on
> the system too. A technique that works with everything is much better
> than something that only works in one small place.
>
> Want to get the basename instead?
>
> while read dir; do basename $dir; done < list
>
> Want to modify the result to add a suffix?
>
> while read dir; do echo $dir.myaddedsuffix; done < list
>
> Want to modify the name in some custom way?
>
> while read dir; do echo $dir | sed 's/foo/bar/; done < list
>
> Want a sorted unique list modified in some custom way?
>
> while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u
>
> The possibilities are endless and as they say limited only by your
> imagination. Anything you can think of doing you can tell the system
> to do it for you. Truly a marvelous thing to be so empowered.
>
> Note that in order to be completely general and work with arbitrary
> names that have embedded newlines then proper quoting is required and
> the wisdom of today says always use null terminated strings. But if
> you are using a file of names then I assume you are operating on a
> restricted and sane set of characters so this won't matter to you.
> I do that all of the time.
>
> Bob
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Fri, 11 Dec 2015 17:28:01 GMT)
Full text and
rfc822 format available.
Message #23 received at submit <at> debbugs.gnu.org (full text, mbox):
2015-12-10 10:40:30 -0700, Bob Proulx:
[...]
> In this instance the first thing I thought of when I read your dirname
> -f request was a loop.
>
> while read dir; do dirname $dir; done < list
"read dir" expects the input in a very specific format and
depends on the current value of IFS (like a dir called "my\dir "
has to be input as "my\\dir\ " with the default value of IFS)
and can't accept dir names with newline characters.
Invoking the split+glob operator on $dir doesn't make sense here
unless you mean the input to be treated as a $IFS delimited list
of patterns.
If the intention was to treat the input as a list of file
paths, one per line (so can't do file paths with newline
characters), then that would rather be:
while IFS= read -r dir; do dirname -- "$dir"; done < list
>
> Pádraig suggested xargs which was even shorter.
>
> xargs dirname < filename
That expects yet another input format. That time, it can cope
with any file path, since newline can be specified using quotes
like:
"my dir
with newline"
The output of dirname however won't be post-processable.
> Both of those directly do exactly what you had asked to do. The
> technique works not only with dirname but with every other command on
> the system too. A technique that works with everything is much better
> than something that only works in one small place.
The while loop you can't reasonably do for large file lists as
running one dirname invocation per file is going to be
prohibitive in terms of performance.
The xargs approach, you can do only with GNU dirname as it
supports passing more than one string as an extension over the
standard.
I think here we're seeing the limits of shell scripting. OK,
dirname is the tool to get a dirname, but doing it in a loop is
not practical/efficient and produces an ambiguous output (not to
mention that file names are not necessarily valid text so the
passing of that data through text utilities can be a problem)
Extending all the utilities so that they can take a list of
arguments from stdin instead of arguments is one solution (and
one solution applied by several GNU utilities already (like
--files0-from in du/sort/wc) but I agree xargs -r0 is a more
generic solution and good enough for things like dirname since
the number of invocations is minimised..
The --files0-from option of du/sort/wc are justified because
xargs -r0 wouldn't work (as several invocations of the utilities
could end-up being made which wouldn't work for them), but not
for dirname. (I'd argue ls would need one for its sorting though
(and an option to outut NUL delimited).
That can't be applied for commands that take only one argument
like basename though.
GNU xargs addresses the problem of the stdin of the command
being redirected (like for rm -i) with its --arg-file option
The problem with dirname is that OK, GNU dirname can take
several paths as arguments but then its output is not
post-processable reliably ("dirname a/b a/c" and "dirname
$'a\na/b'" produce the same output for instance).
Here using another programming language/paradigm that has the
"dirname" capability and can deal with list of strings reliably
within the same command (like perl or zsh) would be a more
reliable and efficient approach.
zsh:
files=(${(z)<file.list}) # read a NUL delimited list
print -rN -- $files:h # print the dirnames as NUL delimited
print -rN -- ${(u)files:h} # same for unique items.
perl -MFile::Basename -0 -lpe '$_ = dirname $_' < file.list
Those can handle strings with any byte values. For the shell
pipelines, you have issues with NUL/NL, and depending on the
tool, invalid characters, long lines, things starting with "-",
things containing "="...
> Want a sorted unique list modified in some custom way?
>
> while read dir; do echo $dir | sed 's/foo/bar/'; done < list | sort -u
[...]
I would recommend the reading of
https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
Here, I'd do:
< list sed -z 's/foo/bar/' | LC_ALL=C sort -zu
Assuming a NUL delimited list in "list".
--
Stephane
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Fri, 11 Dec 2015 18:39:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 22128 <at> debbugs.gnu.org (full text, mbox):
On 11/12/15 14:46, Stephane Chazelas wrote:
> 2015-12-10 10:40:30 -0700, Bob Proulx:
> [...]
>> In this instance the first thing I thought of when I read your dirname
>> -f request was a loop.
>>
>> while read dir; do dirname $dir; done < list
>
> "read dir" expects the input in a very specific format and
> depends on the current value of IFS (like a dir called "my\dir "
> has to be input as "my\\dir\ " with the default value of IFS)
> and can't accept dir names with newline characters.
>
> Invoking the split+glob operator on $dir doesn't make sense here
> unless you mean the input to be treated as a $IFS delimited list
> of patterns.
>
> If the intention was to treat the input as a list of file
> paths, one per line (so can't do file paths with newline
> characters), then that would rather be:
>
> while IFS= read -r dir; do dirname -- "$dir"; done < list
>
>>
>> Pádraig suggested xargs which was even shorter.
>>
>> xargs dirname < filename
>
> That expects yet another input format. That time, it can cope
> with any file path, since newline can be specified using quotes
> like:
>
> "my dir
> with newline"
>
> The output of dirname however won't be post-processable.
Both GNU basename and dirname since 8.16 (2012) got
the -z option to make the _output_ post-processable,
along with support for processing multiple inputs.
xargs splits arguments on the _input_ appropriately.
In general xargs is fine for this when the tool
doesn't need to process all inputs at once
(like sorting or generating a total for example).
cheers,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22128
; Package
coreutils
.
(Fri, 11 Dec 2015 22:09:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 22128 <at> debbugs.gnu.org (full text, mbox):
2015-12-11 18:38:15 +0000, Pádraig Brady:
[...]
> Both GNU basename and dirname since 8.16 (2012) got
> the -z option to make the _output_ post-processable,
> along with support for processing multiple inputs.
[...]
Indeed.
And I can see GNU basename takes a -a option to accept more than
one file path (and take the suffix to strip with -s which is
how basename should have been designed in the first place) so
one can do:
< list xargs -r0 basename -az --
Or
< list xargs -r0 basename -azs .txt --
and have a minimum number of invocations of basename and a
post-processable output.
All good. Thanks.
--
Stephane
Severity set to 'wishlist' from 'normal'
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 24 Oct 2018 21:36:01 GMT)
Full text and
rfc822 format available.
Added tag(s) wontfix.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 24 Oct 2018 21:36:01 GMT)
Full text and
rfc822 format available.
Changed bug title to 'dirname: accept file list input' from 'dirname enhancement'
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 24 Oct 2018 21:36:01 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
22128 <at> debbugs.gnu.org and "Nellis, Kenneth" <Kenneth.Nellis <at> xerox.com>
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 24 Oct 2018 21:36:01 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 22 Nov 2018 12:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 214 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.