GNU bug report logs - #7159
24.0.50; (1) `file-name-(non)directory': bad return values, (2) `directory-sep-char'

Previous Next

Package: emacs;

Reported by: "Drew Adams" <drew.adams <at> oracle.com>

Date: Mon, 4 Oct 2010 17:59:02 UTC

Severity: minor

Tags: wontfix

Found in version 24.0.50

Done: "Drew Adams" <drew.adams <at> oracle.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 7159 in the body.
You can then email your comments to 7159 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#7159; Package emacs. (Mon, 04 Oct 2010 17:59:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Drew Adams" <drew.adams <at> oracle.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 04 Oct 2010 17:59:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Drew Adams" <drew.adams <at> oracle.com>
To: <bug-gnu-emacs <at> gnu.org>
Subject: 24.0.50; (1) `file-name-(non)directory': bad return values, (2)
	`directory-sep-char'
Date: Mon, 4 Oct 2010 10:59:12 -0700
emacs -Q
 
1.

(setq toto (wildcard-to-regexp "c:/foo/bar/b*.el"))
 gives "\\`c:/foo/bar/b[^^@]*\\.el\\'" (where ^@ is a control char)
 
(setq titi (substring 2 toto))
 gives "c:/foo/bar/b[^^@]*\\.el\\'"  (^@ is a control char)
 
(file-name-absolute-p toto) ; -> t
(file-name-absolute-p titi) ; -> t
 
That is all as one would expect.  `file-name-absolute-p' has no
problem with either file-name string, even though neither string
is a legitimate file name and both contain a control char.
This is normal (IMO).
 
BUT:
 
(file-name-directory    titi) ; gives "c:/foo/bar/b[^^@]*\\.el\\"
(file-name-nondirectory titi) ; gives "'"
 
These functions should know how to parse titi to produce "c:/foo/bar/"
and "b[^^@]*\\.el\\'", respectively (where ^@ is the control char).
 
It is not expected that these functions return names that necessarily
map to actual directories or files.  What is expected is that they
remove the non-directory and directory components of the strings they
are passed.  That is not happening here.


Also:

(setq baz "c:/foo/bar/*\\.el\\'")
(file-name-nondirectory baz) ; gives "'"
(setq baz "c:/foo/bar/*\\.el\\ABC")
(file-name-nondirectory baz) ; gives "ABC"

So I suspect that the `file-name-nondirectory' part of this bug
is at least in part a Windows problem.  The code seems to be
interpreting the backslash (?\) near the end as a directory
separator.  If so, that is definitely wrong.  Even on Windows, the
code should use the value of `directory-sep-char', which is ?/,
not ?\.


2.

However, I see from the doc string that `directory-sep-char' has
been made obsolete:

 directory-sep-char is a variable defined in `subr.el'.
 Its value is 47

   This variable is obsolete since 21.1;
   do not use it, just use `/'.
   This variable is potentially risky when used as a file local variable.

 Documentation:
 Directory separator character for built-in functions that return file names.
 The value is always ?/. 

That seems misguided, and the buggy behavior noted above is a good
example of why.  The correct way to handle this would be to make
`directory-sep-char' a defconst with value ?/.  And code should always
use this named constant, NOT a literal ?/.  The bugged behavior here
shows why: someone coding  `file-name-nondirectory' seems to have
treated (hard-coded) ?\ as the directory separator on Windows (just a
guess).

Note too that the code has another minor bug: The call to
`make-obsolete-variable' (which should anyway be removed, and the
defvar simply replaced by a defconst) incorrectly uses "`/'" instead
of "?/".  The doc string itself is correct in referring to "?/".

(defconst directory-sep-char ?/
  "Directory separator character for built-in functions that return file names.
The value is always ?/.")
(make-obsolete-variable 'directory-sep-char "do not use it, just use `/'."
"21.1")
                                                                     ^^^

In GNU Emacs 24.0.50.1 (i386-mingw-nt5.1.2600) of 2010-09-20 on 3249CTO
 Windowing system distributor `Microsoft Corp.', version 5.1.2600
 configured using `configure --with-gcc (4.4) --no-opt --cflags
 -Ic:/imagesupport/include'
 





Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Mon, 04 Oct 2010 19:27:02 GMT) Full text and rfc822 format available.

Notification sent to "Drew Adams" <drew.adams <at> oracle.com>:
bug acknowledged by developer. (Mon, 04 Oct 2010 19:27:02 GMT) Full text and rfc822 format available.

Message #10 received at 7159-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 7159-done <at> debbugs.gnu.org
Subject: Re: bug#7159: 24.0.50;
	(1) `file-name-(non)directory': bad return values,
	(2)	`directory-sep-char'
Date: Mon, 04 Oct 2010 21:29:40 +0200
> From: "Drew Adams" <drew.adams <at> oracle.com>
> Date: Mon, 4 Oct 2010 10:59:12 -0700
> Cc: 
> 
> BUT:
>  
> (file-name-directory    titi) ; gives "c:/foo/bar/b[^^@]*\\.el\\"
> (file-name-nondirectory titi) ; gives "'"
>  
> These functions should know how to parse titi to produce "c:/foo/bar/"
> and "b[^^@]*\\.el\\'", respectively (where ^@ is the control char).

You are forgetting the backslashes that wildcard-to-regexp inserted.
On DOS and Windows, Emacs treats backslashes as directory separators,
as you'd expect.  So "c:/foo/bar/b[^^@]*\\.el\\" looks like a leading
directory of a file whose basename is "'".

In other words, don't pass a regexp with backslashes to these
functions, because you won't get what you think you will.

> It is not expected that these functions return names that necessarily
> map to actual directories or files.

And indeed, they don't.

> So I suspect that the `file-name-nondirectory' part of this bug
> is at least in part a Windows problem.  The code seems to be
> interpreting the backslash (?\) near the end as a directory
> separator.

It does, by design.

> If so, that is definitely wrong.  Even on Windows, the
> code should use the value of `directory-sep-char', which is ?/,
> not ?\.

On Windows, we support both, and we always will.  Anything else means
a terrible breakage, believe me.  For example, it would be very hard
to parse output of programs that emit file name with backslashes.
With the current setup, this is seamless, even if the file names use
mixed forward- and back-slashes (yes, it happens with GCC and GDB, for
example, or even with Make sometimes).

> However, I see from the doc string that `directory-sep-char' has
> been made obsolete:

In fact, just yesterday it was removed altogether, because it has not
effect on what Emacs does.  That's been like that for years, and we
saw no complains.

I'm closing this bug.




Did not alter fixed versions and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 04 Oct 2010 22:20:03 GMT) Full text and rfc822 format available.

Reply sent to "Drew Adams" <drew.adams <at> oracle.com>:
You have taken responsibility. (Mon, 04 Oct 2010 22:22:01 GMT) Full text and rfc822 format available.

Notification sent to "Drew Adams" <drew.adams <at> oracle.com>:
bug acknowledged by developer. (Mon, 04 Oct 2010 22:22:02 GMT) Full text and rfc822 format available.

Message #17 received at 7159-done <at> debbugs.gnu.org (full text, mbox):

From: "Drew Adams" <drew.adams <at> oracle.com>
To: "'Eli Zaretskii'" <eliz <at> gnu.org>
Cc: 7159-done <at> debbugs.gnu.org
Subject: RE: bug#7159: 24.0.50;
	(1) `file-name-(non)directory': bad return values,
	(2)	`directory-sep-char'
Date: Mon, 4 Oct 2010 15:22:26 -0700
> > (file-name-directory    titi) ; gives "c:/foo/bar/b[^^@]*\\.el\\"
> > (file-name-nondirectory titi) ; gives "'"
> >  
> > These functions should know how to parse titi to produce 
> > "c:/foo/bar/" and "b[^^@]*\\.el\\'", respectively (where ^@
> > is the control char).
> 
> You are forgetting the backslashes that wildcard-to-regexp inserted.

It should be obvious that I am NOT forgetting such backslashes.

> On DOS and Windows, Emacs treats backslashes as directory separators,
> as you'd expect.  So "c:/foo/bar/b[^^@]*\\.el\\" looks like a leading
> directory of a file whose basename is "'".

No.

Well, let me put it another way: That is just what this bug report is about:
Backslashes are NOT directory separators for Emacs - or at least they should not
be.  Even on Windows.  This bug report says we should get rid of any such
vestigial treatment.

As the doc string for `directory-sep-char' indicates, ?/ is the only directory
separator for Emacs - or at least it should be.

 "Directory separator character for built-in functions that
  return file names.  The value is always ?/."

It says "return" rather than "accept or return" because such functions don't yet
DTRT wrt input.  (And it says "built-in" rather than "standard", which would be
better.)

> In other words, don't pass a regexp with backslashes to these
> functions, because you won't get what you think you will.

Correction: You won't get what you should get, which is just the directory or
non-directory portion of the name, respecting ?/ as the only separator.  And
it's not just about regexps - I used that as an example of a name that included
a backslash.

The point was that file-name decomposition functions should pay no attention to
backslashes.  There is no reason they should consider ?\ to be a directory
separator.

After fixing this we will also be able to remove this parenthetical phrase in
the Elisp manual: "(backslash is also allowed in input on MS-DOS or
MS-Windows)".  This is the _only_ (whispered, parenthetical) mention of such a
vestigial crutch.

> > So I suspect that the `file-name-nondirectory' part of this bug
> > is at least in part a Windows problem.  The code seems to be
> > interpreting the backslash (?\) near the end as a directory
> > separator.
> 
> It does, by design.

Bad design, if so.  More likely it is a vestige.  Perhaps it seemed like the
best or the only possible thing to do at the time, but it is not TRT.

> > If so, that is definitely wrong.  Even on Windows, the
> > code should use the value of `directory-sep-char', which is ?/,
> > not ?\.
> 
> On Windows, we support both, and we always will.  Anything else means
> a terrible breakage, believe me.  For example, it would be very hard
> to parse output of programs that emit file name with backslashes.

Parsing output of programs is something altogether different.  You should not
throw that in here.  Emacs standard functions for decomposing file names should
not be tainted with a eye to parsing arbitrary Windows program output.

That is a completely different requirement and should be handled, naturally, by
special-purpose code (i.e. at a different level) - code that knows just what to
expect from those particular programs.

We can have code in Emacs that parses many different kinds of output, including
Windows file names.  But the need for such special-purpose parsing code is
unrelated to general, standard functions that expect a file name.  In Emacs,
such functions should not treat backslashes as directory separators.

There is no need for that.  Why?  Because ?/ as dir separator works fine for
Emacs code even in Windows.  And because ?/ works always, we should use ONLY ?/.

What is the real requirement to support also ?\?  Please don't say that it is
handling the output from some Windows programs - that is a red herring.

Note that this is very different from the path-separator (":" for Windows, ";"
for UNIX).  In that case, ";" does NOT work for Emacs on Windows - there is no
canonical separator.  But for directories, ?/ _always works_, and it should
therefore be the only char recognized as a dir separator.

For general file-name functions, that is.  Nothing prevents some specialized
Windows parsing code from processing Windows file names that use ?\ (e.g.
creating a file name that uses the standard separator, which can then be handled
in the standard way).

> With the current setup, this is seamless, 

Well, it's apparently been hard-coded here and there to such an extent that you
are screaming that there would be a lot to change to clean it up.  That in
itself is a hefty price for such "seamlessness".

But the real price is the loss of simple standard functions for manipulating
file names correctly.  By pushing special-purpose parsing into the code
everywhere you might think things have been made "seamless", but in fact a muddy
mess has been created.

Emacs's handling of \? in a file name output by an external program should
proceed in two stages: (1) translation to an Emacs file name (if needed), which
means using ?/ as separator, then (2) handling of the Emacs file name using the
standard file-name functions (e.g. `file-name-directory').  That's the clean way
to handle such special-casing.  (And any such use of special-case parsing should
be the exception, not the rule.)

> even if the file names use mixed forward- and back-slashes (yes, it
> happens with GCC and GDB, for example, or even with Make sometimes).

Again, there is nothing wrong with having specialized code that handles such
cases on an individual basis, if they require it.  But the general file-name
handling code of Emacs should handle _Emacs_ file names, which use only ?/ as
the dir separator.

You are muddying the waters by throwing in lots of other stuff here.  Of
_course_ it can happen that some program might need to parse special syntax -
any special syntax.  But this is about the normal Emacs syntax for file names.
And for that syntax the Emacs directory separator is ?/.

If some particular Emacs code is forced by some other code (e.g. GDB) to digest
a name that uses both ?/ and ?\ as directory separators (quelle horreur), then
appropriate Emacs code can be used to fix such names before Emacs tries to deal
with them using the standard file-name functions (e.g. `file-name-directory').

IOW, tack a translation mapping onto the output of GDB or Make or whatever to
standardize such bastard file names (w/ mixed separators).  That can be done by
Emacs, but we should not foul the standard Emacs file-name handling with such
considerations.

"Seamless", indeed.  Putting special-case handling throughout the code doesn't
make things seamless; it makes them quite seamy.

> > However, I see from the doc string that `directory-sep-char' has
> > been made obsolete:
> 
> In fact, just yesterday it was removed altogether, because it has not
> effect on what Emacs does.  That's been like that for years, and we
> saw no complains.

The complaint/suggestion wrt `directory-sep-char' is only that it should be a
constant.  We should not be advising people to hard-code ?/, but rather to use a
constant with a name that proclaims what it is and with a value of ?/.  But this
is only a minor, stylistic concern.  It is not directly related to this bug.

> I'm closing this bug.

I'm reopening it.  To me, this is broken, and this dysfunction is not an
inevitable price to be paid because GDB or whatever outputs Windows file names
using backslashes.  That argument is a copout.

The simple functions `file-name-directory' and `file-name-nondirectory' should
be robust enough to just remove the non-directory and directory portion -
always.  That should be so irregardless of the presence of backslashes.

Those functions are broken on Windows when backslashes are present.  If you
don't want to fix this bug, fine; maybe someone else will someday.

Maybe you don't want to make the effort required to remove such ad-hoc backslash
handling here and there from the Windows Emacs code, but maybe someone else will
someday.  I believe you that the effort might be great, and I accept that
therefore this cannot be a high priority now (there are _many_ outstanding
bugs).  But that does not mean that we currently handle Windows file names
correctly.  That we choose not to fix something now does not imply that it
doesn't need fixing.

Your reason, "Anything else means a terrible breakage, believe me" suggests that
the fix is non-trivial because there is (apparently) lots of code here and there
that still special-cases backslashes, on Windows.  Your example of such
breakage, "to parse output of programs that emit file name with backslashes",
suggests that you do not distinguish parsing Windows program output from Emacs's
general-purpose file-name handling functions.

It is not right to mess up general-purpose file-name-handling functions just for
the benefit of some special-purpose Windows-output parsing here and there.
Write Windows-output specific code to do that according to the particular case
(need), and make the general-purpose file-name handling functions do as they
logically should: recognize ?/ as the only directory separator.





Message #18 received at 7159-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 7159-done <at> debbugs.gnu.org
Subject: Re: bug#7159: 24.0.50;
	(1) `file-name-(non)directory': bad return values,
	(2)	`directory-sep-char'
Date: Tue, 05 Oct 2010 06:10:48 +0200
> From: "Drew Adams" <drew.adams <at> oracle.com>
> Cc: <7159-done <at> debbugs.gnu.org>
> Date: Mon, 4 Oct 2010 15:22:26 -0700
> 
> Well, let me put it another way: That is just what this bug report is about:
> Backslashes are NOT directory separators for Emacs - or at least they should not
> be.  Even on Windows.

We will have to disagree on that.  I've been porting Unix programs to
DOS and Windows for the last 20 years, and in my experience, any
program that does not treat these two flavors equivalently is
fundamentally broken on these systems.

> > In other words, don't pass a regexp with backslashes to these
> > functions, because you won't get what you think you will.
> 
> Correction: You won't get what you should get, which is just the
> directory or non-directory portion of the name, respecting ?/ as the
> only separator.

These two functions are not supposed to be handed regexps anyway, even
on Unix.  For example, what's the filename part of "/foo\\(/a?\\)bar"?

  (file-name-nondirectory "/foo\\(/a?\\)bar") => "a?\\)bar"

Or how about this:

  (file-name-nondirectory "/foo[^/]*") => "]*"

This simply doesn't work, on any system.  Even in your example, you
cheated: you passed a substring of a regexp, to avoid the problems
with leading drive spec, which must be at the beginning of the file
name to work with these functions.

> > On Windows, we support both, and we always will.  Anything else means
> > a terrible breakage, believe me.  For example, it would be very hard
> > to parse output of programs that emit file name with backslashes.
> 
> Parsing output of programs is something altogether different.

It's not different.  These functions are used all the time for parsing
file names, including those in output of other programs.

There are also users who use backslashes in their ~/.emacs files, when
they specify file names and programs.

> That is a completely different requirement and should be handled, naturally, by
> special-purpose code (i.e. at a different level) - code that knows just what to
> expect from those particular programs.

If you do this, you will flood the application sources with ugly
system-dependent conditions.  Hardly a good idea.

> What is the real requirement to support also ?\?

That it is used in DOS/Windows file names in many situations.

> "Seamless", indeed.  Putting special-case handling throughout the code doesn't
> make things seamless; it makes them quite seamy.

The special-case code has to be somewhere.  Having it at the current
low level in C, hidden from the Lisp programs, is the best we can do.

> The simple functions `file-name-directory' and `file-name-nondirectory' should
> be robust enough to just remove the non-directory and directory portion -
> always.

It does that today.




Message #19 received at 7159-done <at> debbugs.gnu.org (full text, mbox):

From: "Drew Adams" <drew.adams <at> oracle.com>
To: "'Eli Zaretskii'" <eliz <at> gnu.org>
Cc: 7159-done <at> debbugs.gnu.org
Subject: RE: bug#7159: 24.0.50;
	(1) `file-name-(non)directory': bad return values,
	(2)	`directory-sep-char'
Date: Mon, 4 Oct 2010 22:06:26 -0700
> > > In other words, don't pass a regexp with backslashes to these
> > > functions, because you won't get what you think you will.
> > 
> > Correction: You won't get what you should get, which is just the
> > directory or non-directory portion of the name, respecting ?/ as the
> > only separator.
> 
> These two functions are not supposed to be handed regexps anyway, even
> on Unix.  For example, ...

No one said that anyone is likely to pass a regexp in place of a file name.
Sorry if my example misled you.

The point is that these _general_, standard functions for simply removing a file
name's (non)directory portion should be able to handle _backslash_ characters
without interpreting them as directory separators - on Windows as on Unix or any
other platform.

I said that the use of a regexp in the example I gave was just that: an example
of a name that contains backslashes.  Nothing more.  I probably should have just
used a literal string, to avoid confusion.

These functions should DTRT with such a name, whether or not it corresponds to a
real file (you agreed with that).  Our difference of opinion is wrt whether a
backslash should ever be considered as a (second kind of) directory separator in
Emacs: you say yes (for Windows); I say no.  I say that even on Windows ?/ is
enough; there is no need for two dir separators in Emacs file names.

You say Emacs must recognize ?\ today at least, because mumblemumble things are
complicated.  I say that even if that is so (and I believe you that it is),
that's not the same as claiming that that _should_ be so.  This is a bug, a poor
design/implementation decision, that we can hope to fix at some point.

> what's the filename part of "/foo\\(/a?\\)bar"?
>   (file-name-nondirectory "/foo\\(/a?\\)bar") => "a?\\)bar"
> Or how about this:
>   (file-name-nondirectory "/foo[^/]*") => "]*"

The answer, for Emacs, should simply be to interpret the chars other than ?/ as
file-name chars, not as directory separators.  It has nothing to do with
interpreting regexp syntax.  It has only to do with interpreting a directory
separator.  The only question/disagreement is wrt ?\ as a directory separator.
IMO, it should not be treated as such.

> > Parsing output of programs is something altogether different.
> 
> It's not different.  These functions are used all the time for parsing
> file names, including those in output of other programs.

But they should be used to parse Emacs file names (i.e., names that use only ?/
as dir separator), nothing more.

If the output of some program is &(*^*&#HI&*U@);';.1?>>!, and that program
considers that to be a file name for some platform, that is (or should be)
irrelevant to standard Emacs file-name decomposition.  After your specialized
code translates that name to its Emacs file name of, say, /foo/bar/toto.c,
_then_ that is something that the standard file-name functions can decompose.

That's all I'm suggesting: keep `file-name-(non)directory' for Emacs file names,
where that notion is platform-independent wrt dir separator.  Use other code as
needed to translate to names that use only ?/ as dir separator.

> There are also users who use backslashes in their ~/.emacs files, when
> they specify file names and programs.

So what?  Again, however & whenever Emacs receives such names, it can use code
that translates them to Emacs file names (names with ?/ as dir separator).

> > That is a completely different requirement and should be 
> > handled, naturally, by special-purpose code (i.e. at a
> > different level) - code that knows just what to
> > expect from those particular programs.
> 
> If you do this, you will flood the application sources with ugly
> system-dependent conditions.  Hardly a good idea.

I didn't say that the "application sources" should do that (though I'm not sure
what you mean by that term).

I said that if some Emacs code expects a "file name" in some format different
from the standard Emacs syntax - i.e., with some other directory separator, then
specialized _Emacs_ code that recognizes such a format can translate it to the
standard format: replace ?\ or another directory separator by ?/, the directory
separator used by Emacs.

The Emacs code that receives such a non-standard (for Emacs) format is the code
that should deal with this.  Not the application code (if by that you mean the
code that produces such output).

> > What is the real requirement to support also ?\?
> 
> That it is used in DOS/Windows file names in many situations.

Not inside Emacs.  It's not needed.  For Emacs, ?/ is sufficient even for
Windows.  So it should suffice - there is no (logical) need for Emacs to have
two standard directory separators (on Windows).

There might be a historical (legacy) reason why we have two today, but only one
is needed: ?/ _always_ works within Emacs.

> > "Seamless", indeed.  Putting special-case handling 
> > throughout the code doesn't make things seamless; it makes
> > them quite seamy.
> 
> The special-case code has to be somewhere.  Having it at the current
> low level in C, hidden from the Lisp programs, is the best we can do.

I agree that it has to be somewhere.  And I recognize that you are far more
familiar with the Emacs implementation (and with Windows) than I.  My point is
that there is no logical reason why the _standard_, _general_ Emacs functions
for decomposing file names (within Emacs) should have to recognize two different
chars as directory separators.  That is just not necessary, since ?/ is all that
is needed, even for Emacs on Windows.  Emacs already DTRT for ?/ on Windows.

You say that Emacs can sometimes receive file names output by some external
programs in a syntax that uses ?\ as dir separators.  I say fine, then the Emacs
code that receives such names can translate ?\ to ?/, so that when it comes to
decomposing an Emacs file name we can use simple, standard functions that expect
only ?/ as dir separator. 

That's the difference in our points of view, as I see it.  And again, you have a
better idea of where those places are that Emacs can expect to receive such
file-name syntax (you mentioned GDB, Make, .emacs...).  Those are the places
where I'd suggest we make the transition to the standard syntax (with only ?/ as
separator).  It is the code that accepts/receives such names that should digest
them to produce standard Emacs file names (i.e., with only ?/ separators).

IOW, if the problem is _external_ programs and an _external_ syntax, then do the
translation at the external/internal boundary: at the point where Emacs gets the
file name from outside.  Don't do it (in effect) at each call to a standard
name-decomposition function.  The translation code could be in C or Lisp for all
I care.  All I would like to see is simple file-name decomposition functions -
no special-casing of ?\ on Windows.

> > The simple functions `file-name-directory' and 
> > `file-name-nondirectory' should be robust enough to just
> > remove the non-directory and directory portion - always.
> 
> It does that today.

No, they do not, because they falsely interpret ?\ as a directory separator (and
only on Windows).






Message #20 received at 7159-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Drew Adams" <drew.adams <at> oracle.com>
Cc: 7159-done <at> debbugs.gnu.org
Subject: Re: bug#7159: 24.0.50;
	(1) `file-name-(non)directory': bad return values,
	(2)	`directory-sep-char'
Date: Tue, 05 Oct 2010 03:09:30 -0400
> From: "Drew Adams" <drew.adams <at> oracle.com>
> Cc: <7159-done <at> debbugs.gnu.org>
> Date: Mon, 4 Oct 2010 22:06:26 -0700
> 
> No one said that anyone is likely to pass a regexp in place of a file name.
> Sorry if my example misled you.

What other real-life use-cases exist that require such a
functionality?

> You say Emacs must recognize ?\ today at least, because mumblemumble things are
> complicated.  I say that even if that is so (and I believe you that it is),
> that's not the same as claiming that that _should_ be so.  This is a bug, a poor
> design/implementation decision, that we can hope to fix at some point.

It isn't a bug, it's a feature that is necessary on DOS and Windows.
"Fixing" that would introduce bugs, some subtle, others glaring.  So
these primitives, which are widely used in Emacs's own Lisp sources,
must retain their equal handling of both flavors of slashes in file
names.  If you still disagree, let's leave it at that, because we will
never agree.

> 
> > what's the filename part of "/foo\\(/a?\\)bar"?
> >   (file-name-nondirectory "/foo\\(/a?\\)bar") => "a?\\)bar"
> > Or how about this:
> >   (file-name-nondirectory "/foo[^/]*") => "]*"
> 
> The answer, for Emacs, should simply be to interpret the chars other than ?/ as
> file-name chars, not as directory separators.  It has nothing to do with
> interpreting regexp syntax.  It has only to do with interpreting a directory
> separator.

But the above output doesn't make sense.  The result is by no means
what file-name-directory and file-name-nondirectory are documented to
produce.  And the reason is that the argument is not a file name.  So
why what you are asking makes sense, and when will it be useful in
Emacs in practice?

> > It's not different.  These functions are used all the time for parsing
> > file names, including those in output of other programs.
> 
> But they should be used to parse Emacs file names (i.e., names that use only ?/
> as dir separator), nothing more.

No.  They are designed to parse file names that are valid on the
underlying OS.

> That's all I'm suggesting: keep `file-name-(non)directory' for Emacs file names,
> where that notion is platform-independent wrt dir separator.  Use other code as
> needed to translate to names that use only ?/ as dir separator.
> 
> > There are also users who use backslashes in their ~/.emacs files, when
> > they specify file names and programs.
> 
> So what?  Again, however & whenever Emacs receives such names, it can use code
> that translates them to Emacs file names (names with ?/ as dir separator).

But we already use that "other code": these two primitives (and
others) which DTRT with any file name that is valid on Windows.
There's no need to change anything.

It appears that you are asking for an additional set of functions,
which ignore backslashes on Windows.  If such functions are to become
part of Emacs, we need to hear the practical use-cases where they
would be useful.  You presented a single example, which you now say
was not relevant.  Please present relevant examples that would justify
yet another set of file-name APIs.  Otherwise, you can always write
such functions yourself, it's hardly a big job.

Btw, I suggest to move the rest of the discussion to emacs-devel, as
it's no longer relevant to the original bug report.  That mailing list
has more subscribers than the bug-reporting list, who may contribute
to the discussion.

> > If you do this, you will flood the application sources with ugly
> > system-dependent conditions.  Hardly a good idea.
> 
> I didn't say that the "application sources" should do that (though I'm not sure
> what you mean by that term).
> 
> I said that if some Emacs code expects a "file name" in some format different
> from the standard Emacs syntax - i.e., with some other directory separator, then
> specialized _Emacs_ code that recognizes such a format can translate it to the
> standard format: replace ?\ or another directory separator by ?/, the directory
> separator used by Emacs.

This code will be specific to Windows, and will clutter Lisp
application-level sources, such as gud.el, grep.el, compile.el,
etc. with the kinds of
  `(if (eq system-type 'windows-nt) fix-file-names-for-drew)'.
That's ugly and unergonomic.  The current situation, though not ideal,
is much better.

> > The special-case code has to be somewhere.  Having it at the current
> > low level in C, hidden from the Lisp programs, is the best we can do.
> 
> I agree that it has to be somewhere.  And I recognize that you are far more
> familiar with the Emacs implementation (and with Windows) than I.  My point is
> that there is no logical reason why the _standard_, _general_ Emacs functions
> for decomposing file names (within Emacs) should have to recognize two different
> chars as directory separators.

Yes, there's a perfectly valid reason: because these primitives are
used everywhere in Emacs packages, and those packages don't want to
know about differences in file format between Posix and Windows
platforms.

Again, we do that in a lot of places, most of which I don't even
remember.  The reason I can safely forget about them is _precisely_
that Lisp code doesn't have to worry about these issues, because the
primitives DTRT.  But here's one more example I just recalled: type
"M-x getenv RET PATH RET" and look at the value.  I'm sure if I think
more, I will recall more examples.  But why waste energy on a problem
that doesn't exist?




Message #21 received at 7159-done <at> debbugs.gnu.org (full text, mbox):

From: "Drew Adams" <drew.adams <at> oracle.com>
To: "'Eli Zaretskii'" <eliz <at> gnu.org>
Cc: 7159-done <at> debbugs.gnu.org
Subject: RE: bug#7159: 24.0.50;
	(1) `file-name-(non)directory': bad return values,
	(2)	`directory-sep-char'
Date: Tue, 5 Oct 2010 07:58:23 -0700
> What other real-life use-cases exist that require such a
> functionality?

Ah, the "real world" argument.  The old no-one-would-ever-do-that refrain. Why
would a user or a program ever try to decompose a file name that contains
backslashes?

Answers: (1) Users and programs will always do what you don't expect - and
sometimes there is no reason they shouldn't.  (2) Occam's razor: There is no
reason to special-case backslash here: slash alone is sufficient.  If not true,
then treat ?\ as a dir separator in Emacs on Unix also.

> > You say Emacs must recognize ?\ today at least, because 
> > mumblemumble things are complicated.  I say that even if
> > that is so (and I believe you that it is),
> > that's not the same as claiming that that _should_ be so.  
> > This is a bug, a poor design/implementation decision, that
> > we can hope to fix at some point.
> 
> It isn't a bug, it's a feature that is necessary on DOS and Windows.

What's not necessary is treating ?\ as a dir separator in Emacs, even on
Windows.  That's clear.

> "Fixing" that would introduce bugs, some subtle, others glaring.  So
> these primitives, which are widely used in Emacs's own Lisp sources,
> must retain their equal handling of both flavors of slashes in file
> names.  If you still disagree, let's leave it at that, because we will
> never agree.

Yes, I still disagree - or rather you do.  That is why I asked to leave the bug
open in hopes that someone else will eventually fix it.  It's clear that you
disagree that there is a bug.

Those places you refer to can be fixed.  Not fixing the standard decomposer
functions makes the bug a self-fulfilling prophecy:  Of course these functions
will be widely used with no intermediary in places where backslashes are
present.  Today, they interpret backslashes, so naturally the mess is
widespread.

This can be cleaned up progressively:

1. The first step is to remove mention of ?\ from the doc for these functions,
and thus not encourage people to depend on this behavior.

2. The second step is to start fixing code where they these functions are used
directly in contexts where \? might enter Emacs within a file name, translating
such an outside format by replacing \? with ?/.

3. The final step is to stop those functions from recognizing \? as a dir
separator.  Only in this final step will any remaining bugs surface: places
where we neglected to clean up the file name before passing it to these
functions.

If the second step is done right then the transition will be "seamless".  Any
bugs uncovered after fixing the function definitions (code) will be the
exception.
 
> > The answer, for Emacs, should simply be to interpret the 
> > chars other than ?/ as file-name chars, not as directory
> > separators.  It has nothing to do with interpreting regexp syntax.
> > It has only to do with interpreting a directory separator.
> 
> But the above output doesn't make sense.

It does if that is the argument passed.  We have agreed that the argument need
not name an existing file.  These functions should be robust and simple enough
that they do nothing other than split the string at the last directory
separator.

> The result is by no means what file-name-directory and
> file-name-nondirectory are documented to produce.
> And the reason is that the argument is not a file name.

You already agreed that the argument need not be a file name.  It might not name
an existing file.  And the functions should be robust and simple enough to DTRT
with any string, even a string that could not possibly be a file name.

But that latter part is a bit beside the point here.  Nothing in the bug report
_requires_ this to be about names that could never name a file.  This is only
about names that contain backslashes.  It is only about not having these
functions treat \? as a dir separator.

> So why what you are asking makes sense, and when will it be
> useful in Emacs in practice?
> 
> > > It's not different.  These functions are used all the 
> > > time for parsing file names, including those in output
> > > of other programs.
> > 
> > But they should be used to parse Emacs file names (i.e., 
> > names that use only ?/ as dir separator), nothing more.
> 
> No.  They are designed to parse file names that are valid on the
> underlying OS.

That's where we disagree.  I can't speak to their original intent, but what they
should do is retrieve the directory and non-directory portions of an _Emacs_
file name, where the latter means a name that uses ?/ as dir separator.

These functions should know nothing about the underlying OS.  They should be
handed _Emacs_ file names, that is, names with ?/ as the dir separator.

> > That's all I'm suggesting: keep `file-name-(non)directory' 
> > for Emacs file names, where that notion is platform-independent
> > wrt dir separator.  Use other code as needed to translate to
> > names that use only ?/ as dir separator.
> > 
> > > There are also users who use backslashes in their 
> > > ~/.emacs files, when they specify file names and programs.
> > 
> > So what?  Again, however & whenever Emacs receives such 
> > names, it can use code that translates them to Emacs file
> > names (names with ?/ as dir separator).
> 
> But we already use that "other code": these two primitives (and
> others) which DTRT with any file name that is valid on Windows.
> There's no need to change anything.

It should be clear from my use above that by "other" I meant "other than these
functions".  These standard functions should be only for operating on Emacs file
names.  "Other" means platform-specific translation to platform-independent
Emacs file names (with ?/ separators).  

> It appears that you are asking for an additional set of functions,
> which ignore backslashes on Windows.

No, I am asking for _these_ functions to stop being special-cased according to
the platform, to stop treating ?\ as dir separator on Windows (since ?/ works on
Windows too).

This is about separating out the platform-specific treatment to only the places
where it is needed, and having the standard functions that access parts of a
file name use and expect the standard Emacs file-name syntax: ?/ as dir
separator.

> If such functions are to become
> part of Emacs, we need to hear the practical use-cases where they
> would be useful.  You presented a single example, which you now say
> was not relevant.  Please present relevant examples that would justify
> yet another set of file-name APIs.  Otherwise, you can always write
> such functions yourself, it's hardly a big job.
> 
> Btw, I suggest to move the rest of the discussion to emacs-devel, as
> it's no longer relevant to the original bug report.  That mailing list
> has more subscribers than the bug-reporting list, who may contribute
> to the discussion.

Just leave the bug open please.  Mark it as "wishlist" if such is your wont.  I
don't have time to argue anymore about this.  I've made my argument here clear.

> > > If you do this, you will flood the application sources with ugly
> > > system-dependent conditions.  Hardly a good idea.
> > 
> > I didn't say that the "application sources" should do that 
> > (though I'm not sure what you mean by that term).
> > 
> > I said that if some Emacs code expects a "file name" in 
> > some format different from the standard Emacs syntax - i.e.,
> > with some other directory separator, then specialized _Emacs_
> > code that recognizes such a format can translate it to the
> > standard format: replace ?\ or another directory separator 
> > by ?/, the directory separator used by Emacs.
> 
> This code will be specific to Windows, and will clutter Lisp
> application-level sources, such as gud.el, grep.el, compile.el,
> etc. with the kinds of `(if (eq system-type 'windows-nt)
> fix-file-names-for-drew)'.  That's ugly and unergonomic.
> The current situation, though not ideal, is much better.

(standard-file-name the-input) is exactly what we should have.  It shows clearly
what is involved.  It is such a function that would do the (eq system-type
'windows-nt) replace-\-by-/).

The point is that at places in the code where you see (file-name-directory
(standard-file-name the-input)) it will be clear that `the-input' might not be
in the standard Emacs format (with only ?/ as dir separator).

More typically, such places will call `standard-file-name' only once to convert
the external input once and for all.  From then on, Emacs will be dealing only
with a standard file name.  Everywhere you do not see `standard-file-name' you
will be sure that the file name is an Emacs name (?/ as separator).

> > > The special-case code has to be somewhere.  Having it at 
> > > the current low level in C, hidden from the Lisp programs,
> > > is the best we can do.
> > 
> > I agree that it has to be somewhere.  And I recognize that 
> > you are far more familiar with the Emacs implementation
> > (and with Windows) than I.  My point is that there is no
> > logical reason why the _standard_, _general_ Emacs functions
> > for decomposing file names (within Emacs) should have to 
> > recognize two different chars as directory separators.
> 
> Yes, there's a perfectly valid reason: because these primitives are
> used everywhere in Emacs packages, and those packages don't want to
> know about differences in file format between Posix and Windows
> platforms.

They need not know.  Only places where an external-format name is introduced
into Emacs need call `standard-file-name' (or whatever name is used).  And even
if those places are also numerous, they need call it only once.

Once the file name has been converted to the standard Emacs form (only ?/ as
separator), it can travel on its merry way throughout Emacs, with no code
needing to worry about anything platform-dependent in the name.

> Again, we do that in a lot of places, most of which I don't even
> remember.  The reason I can safely forget about them is _precisely_
> that Lisp code doesn't have to worry about these issues, because the
> primitives DTRT.

You don't know where they are, and cannot tell, precisely because there is no
explicit call to a function that translates to the standard form.

If every such location where an external format might enter Emacs had a call to
`standard-file-name' then (a) we would easily recognize those places and (b) all
other code would be sure to be dealing with simple, standard, Emacs file names.

It's not about you remembering all such locations.  It's about identifying them
clearly, making _them_ alone do the translation, explicitly.

> But here's one more example I just recalled: type
> "M-x getenv RET PATH RET" and look at the value.  I'm sure if I think
> more, I will recall more examples.  But why waste energy on a problem
> that doesn't exist?

To convert a PATH you need only iterate wrt `path-separator', calling
`standard-file-name' on each path component.  Again, doing that makes it
explicit at that point in the code that what is being handled is a list of file
names that are not necessarily in standard form.

We can agree to disagree - I don't think we're going to convince each other.
Please leave the bug report open, for possible consideration in the future or by
others.





Message #22 received at 7159-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 7159-done <at> debbugs.gnu.org
Subject: Re: bug#7159: 24.0.50;
	(1) `file-name-(non)directory': bad return values,
	(2)	`directory-sep-char'
Date: Tue, 05 Oct 2010 21:17:17 +0200
> From: "Drew Adams" <drew.adams <at> oracle.com>
> Cc: <7159-done <at> debbugs.gnu.org>
> Date: Tue, 5 Oct 2010 07:58:23 -0700
> 
> > What other real-life use-cases exist that require such a
> > functionality?
> 
> Ah, the "real world" argument.

What argument?  I asked a reasonable question, and as someone who
requests to change existing APIs, it is not unreasonable to expect a
decent answer.  At the very least, you could tell about your own
use-case, which prompted you to ask for this.

If you aren't prepared to present your case, then I don't see any
reason to continue discussing this, let alone consider any changes in
veteran and well-established APIs.

> Answers: (1) Users and programs will always do what you don't expect - and
> sometimes there is no reason they shouldn't.  (2) Occam's razor: There is no
> reason to special-case backslash here: slash alone is sufficient.  If not true,
> then treat ?\ as a dir separator in Emacs on Unix also.

That's not an answer, sorry.  I asked for specific, practical
use-cases, not about theoretical "why not?" philosophical arguments.
We don't generally implement in Emacs APIs for situations that don't
happen in practice.  Life's too short.

> You already agreed that the argument need not be a file name.

No, I didn't.  I agreed that the argument need not name an existing
file.  But it still must be a valid file name, by the rules of the
underlying filesystem.  We sometimes extend these rules (cf. Tramp and
the "remote" file names in general) to allow Emacs-specific features,
but we never restrict them.

> > No.  They are designed to parse file names that are valid on the
> > underlying OS.
> 
> That's where we disagree.

Yes, we do.

> These functions should know nothing about the underlying OS.  They should be
> handed _Emacs_ file names, that is, names with ?/ as the dir separator.

Not going to happen.  At most, you may have (or write) additional
functions.  An attempt at imposing on the Emacs developers to go
through all the sources and change code that works perfectly well is
not going to fly.

> > But we already use that "other code": these two primitives (and
> > others) which DTRT with any file name that is valid on Windows.
> > There's no need to change anything.
> 
> It should be clear from my use above that by "other" I meant "other than these
> functions".  These standard functions should be only for operating on Emacs file
> names.  "Other" means platform-specific translation to platform-independent
> Emacs file names (with ?/ separators).  

What is so special about "these" functions that you insist that they
and only they do what you want?  What's in a name?

> This is about separating out the platform-specific treatment to only the places
> where it is needed

My point, which I obviously fail to drive home, is that they are
needed everywhere where we use file names in Emacs.

> (standard-file-name the-input) is exactly what we should have.

Please take a second look at that function (or, rather, at
w32-convert-standard-filename, its w32 implementation).  You will see
that it does things that are far from the "simple" slash parsing you
asked for.  For example, if we use w32-convert-standard-filename
indiscriminantly, as you suggest, we will be unable to support remote
file names on Windows.

> The point is that at places in the code where you see (file-name-directory
> (standard-file-name the-input)) it will be clear that `the-input' might not be
> in the standard Emacs format (with only ?/ as dir separator).

You are asking Emacs developers to:

 . find all the places in Lisp code that may potentially get file
   names with backslashes

 . add a call to some function to all those places

 . somehow remember to insert a call to that function to any place in
   future Lisp code that is added to Emacs

 . remember to teach all Emacs contributors who work on platforms
   other than Windows to use that paradigm in their contributions

All that when we already have APIs that work correctly everywhere, and
don't suffer from the above maintenance burden.  Now, please try to
guess what are the chances of this to happen, even if I did agree with
your arguments.

> More typically, such places will call `standard-file-name' only once to convert
> the external input once and for all.

There _is_ no "only once".  Emacs does not get _filenames_ from all
those sources, it gets _text_.  To understand which parts of that text
are file names, it needs to analyze the text.  That analysis is not
always in a single place and on a single level.  All those places need
to be changed, and for no good reason, because they already work fine.

> > But here's one more example I just recalled: type
> > "M-x getenv RET PATH RET" and look at the value.  I'm sure if I think
> > more, I will recall more examples.  But why waste energy on a problem
> > that doesn't exist?
> 
> To convert a PATH you need only iterate wrt `path-separator', calling
> `standard-file-name' on each path component.

That would break some programs when Emacs invokes them on Windows and
DOS, because they are not prepared to handle directories with forward
slashes in PATH.  That's the reason we don't translate PATH to use
forward slashes to begin with.

You see, life isn't as simple as you think it is.  Please try to give
a bit more respect to what we have now in Emacs, and please don't
assume that its current shape is due to some omission or lack of
foresight, but rather that it's based on a lot of experience and grey
hair.

> Please leave the bug report open, for possible consideration in the future or by
> others.

No.  The bug is there for everyone to see and "consider", but there's
no reason to leave it open, because I will do my best to block any
"fixes" like that to those APIs.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 03 Nov 2010 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 234 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.