GNU bug report logs - #7159
24.0.50; (1) `file-name-(non)directory': bad return values, (2) `directory-sep-char'

Previous Next

Package: emacs;

Reported by: "Drew Adams" <drew.adams <at> oracle.com>

Date: Mon, 4 Oct 2010 17:59:02 UTC

Severity: minor

Tags: wontfix

Found in version 24.0.50

Done: "Drew Adams" <drew.adams <at> oracle.com>

Bug is archived. No further changes may be made.

Full log


Message #21 received at 7159-done <at> debbugs.gnu.org (full text, mbox):

From: "Drew Adams" <drew.adams <at> oracle.com>
To: "'Eli Zaretskii'" <eliz <at> gnu.org>
Cc: 7159-done <at> debbugs.gnu.org
Subject: RE: bug#7159: 24.0.50;
	(1) `file-name-(non)directory': bad return values,
	(2)	`directory-sep-char'
Date: Tue, 5 Oct 2010 07:58:23 -0700
> What other real-life use-cases exist that require such a
> functionality?

Ah, the "real world" argument.  The old no-one-would-ever-do-that refrain. Why
would a user or a program ever try to decompose a file name that contains
backslashes?

Answers: (1) Users and programs will always do what you don't expect - and
sometimes there is no reason they shouldn't.  (2) Occam's razor: There is no
reason to special-case backslash here: slash alone is sufficient.  If not true,
then treat ?\ as a dir separator in Emacs on Unix also.

> > You say Emacs must recognize ?\ today at least, because 
> > mumblemumble things are complicated.  I say that even if
> > that is so (and I believe you that it is),
> > that's not the same as claiming that that _should_ be so.  
> > This is a bug, a poor design/implementation decision, that
> > we can hope to fix at some point.
> 
> It isn't a bug, it's a feature that is necessary on DOS and Windows.

What's not necessary is treating ?\ as a dir separator in Emacs, even on
Windows.  That's clear.

> "Fixing" that would introduce bugs, some subtle, others glaring.  So
> these primitives, which are widely used in Emacs's own Lisp sources,
> must retain their equal handling of both flavors of slashes in file
> names.  If you still disagree, let's leave it at that, because we will
> never agree.

Yes, I still disagree - or rather you do.  That is why I asked to leave the bug
open in hopes that someone else will eventually fix it.  It's clear that you
disagree that there is a bug.

Those places you refer to can be fixed.  Not fixing the standard decomposer
functions makes the bug a self-fulfilling prophecy:  Of course these functions
will be widely used with no intermediary in places where backslashes are
present.  Today, they interpret backslashes, so naturally the mess is
widespread.

This can be cleaned up progressively:

1. The first step is to remove mention of ?\ from the doc for these functions,
and thus not encourage people to depend on this behavior.

2. The second step is to start fixing code where they these functions are used
directly in contexts where \? might enter Emacs within a file name, translating
such an outside format by replacing \? with ?/.

3. The final step is to stop those functions from recognizing \? as a dir
separator.  Only in this final step will any remaining bugs surface: places
where we neglected to clean up the file name before passing it to these
functions.

If the second step is done right then the transition will be "seamless".  Any
bugs uncovered after fixing the function definitions (code) will be the
exception.
 
> > The answer, for Emacs, should simply be to interpret the 
> > chars other than ?/ as file-name chars, not as directory
> > separators.  It has nothing to do with interpreting regexp syntax.
> > It has only to do with interpreting a directory separator.
> 
> But the above output doesn't make sense.

It does if that is the argument passed.  We have agreed that the argument need
not name an existing file.  These functions should be robust and simple enough
that they do nothing other than split the string at the last directory
separator.

> The result is by no means what file-name-directory and
> file-name-nondirectory are documented to produce.
> And the reason is that the argument is not a file name.

You already agreed that the argument need not be a file name.  It might not name
an existing file.  And the functions should be robust and simple enough to DTRT
with any string, even a string that could not possibly be a file name.

But that latter part is a bit beside the point here.  Nothing in the bug report
_requires_ this to be about names that could never name a file.  This is only
about names that contain backslashes.  It is only about not having these
functions treat \? as a dir separator.

> So why what you are asking makes sense, and when will it be
> useful in Emacs in practice?
> 
> > > It's not different.  These functions are used all the 
> > > time for parsing file names, including those in output
> > > of other programs.
> > 
> > But they should be used to parse Emacs file names (i.e., 
> > names that use only ?/ as dir separator), nothing more.
> 
> No.  They are designed to parse file names that are valid on the
> underlying OS.

That's where we disagree.  I can't speak to their original intent, but what they
should do is retrieve the directory and non-directory portions of an _Emacs_
file name, where the latter means a name that uses ?/ as dir separator.

These functions should know nothing about the underlying OS.  They should be
handed _Emacs_ file names, that is, names with ?/ as the dir separator.

> > That's all I'm suggesting: keep `file-name-(non)directory' 
> > for Emacs file names, where that notion is platform-independent
> > wrt dir separator.  Use other code as needed to translate to
> > names that use only ?/ as dir separator.
> > 
> > > There are also users who use backslashes in their 
> > > ~/.emacs files, when they specify file names and programs.
> > 
> > So what?  Again, however & whenever Emacs receives such 
> > names, it can use code that translates them to Emacs file
> > names (names with ?/ as dir separator).
> 
> But we already use that "other code": these two primitives (and
> others) which DTRT with any file name that is valid on Windows.
> There's no need to change anything.

It should be clear from my use above that by "other" I meant "other than these
functions".  These standard functions should be only for operating on Emacs file
names.  "Other" means platform-specific translation to platform-independent
Emacs file names (with ?/ separators).  

> It appears that you are asking for an additional set of functions,
> which ignore backslashes on Windows.

No, I am asking for _these_ functions to stop being special-cased according to
the platform, to stop treating ?\ as dir separator on Windows (since ?/ works on
Windows too).

This is about separating out the platform-specific treatment to only the places
where it is needed, and having the standard functions that access parts of a
file name use and expect the standard Emacs file-name syntax: ?/ as dir
separator.

> If such functions are to become
> part of Emacs, we need to hear the practical use-cases where they
> would be useful.  You presented a single example, which you now say
> was not relevant.  Please present relevant examples that would justify
> yet another set of file-name APIs.  Otherwise, you can always write
> such functions yourself, it's hardly a big job.
> 
> Btw, I suggest to move the rest of the discussion to emacs-devel, as
> it's no longer relevant to the original bug report.  That mailing list
> has more subscribers than the bug-reporting list, who may contribute
> to the discussion.

Just leave the bug open please.  Mark it as "wishlist" if such is your wont.  I
don't have time to argue anymore about this.  I've made my argument here clear.

> > > If you do this, you will flood the application sources with ugly
> > > system-dependent conditions.  Hardly a good idea.
> > 
> > I didn't say that the "application sources" should do that 
> > (though I'm not sure what you mean by that term).
> > 
> > I said that if some Emacs code expects a "file name" in 
> > some format different from the standard Emacs syntax - i.e.,
> > with some other directory separator, then specialized _Emacs_
> > code that recognizes such a format can translate it to the
> > standard format: replace ?\ or another directory separator 
> > by ?/, the directory separator used by Emacs.
> 
> This code will be specific to Windows, and will clutter Lisp
> application-level sources, such as gud.el, grep.el, compile.el,
> etc. with the kinds of `(if (eq system-type 'windows-nt)
> fix-file-names-for-drew)'.  That's ugly and unergonomic.
> The current situation, though not ideal, is much better.

(standard-file-name the-input) is exactly what we should have.  It shows clearly
what is involved.  It is such a function that would do the (eq system-type
'windows-nt) replace-\-by-/).

The point is that at places in the code where you see (file-name-directory
(standard-file-name the-input)) it will be clear that `the-input' might not be
in the standard Emacs format (with only ?/ as dir separator).

More typically, such places will call `standard-file-name' only once to convert
the external input once and for all.  From then on, Emacs will be dealing only
with a standard file name.  Everywhere you do not see `standard-file-name' you
will be sure that the file name is an Emacs name (?/ as separator).

> > > The special-case code has to be somewhere.  Having it at 
> > > the current low level in C, hidden from the Lisp programs,
> > > is the best we can do.
> > 
> > I agree that it has to be somewhere.  And I recognize that 
> > you are far more familiar with the Emacs implementation
> > (and with Windows) than I.  My point is that there is no
> > logical reason why the _standard_, _general_ Emacs functions
> > for decomposing file names (within Emacs) should have to 
> > recognize two different chars as directory separators.
> 
> Yes, there's a perfectly valid reason: because these primitives are
> used everywhere in Emacs packages, and those packages don't want to
> know about differences in file format between Posix and Windows
> platforms.

They need not know.  Only places where an external-format name is introduced
into Emacs need call `standard-file-name' (or whatever name is used).  And even
if those places are also numerous, they need call it only once.

Once the file name has been converted to the standard Emacs form (only ?/ as
separator), it can travel on its merry way throughout Emacs, with no code
needing to worry about anything platform-dependent in the name.

> Again, we do that in a lot of places, most of which I don't even
> remember.  The reason I can safely forget about them is _precisely_
> that Lisp code doesn't have to worry about these issues, because the
> primitives DTRT.

You don't know where they are, and cannot tell, precisely because there is no
explicit call to a function that translates to the standard form.

If every such location where an external format might enter Emacs had a call to
`standard-file-name' then (a) we would easily recognize those places and (b) all
other code would be sure to be dealing with simple, standard, Emacs file names.

It's not about you remembering all such locations.  It's about identifying them
clearly, making _them_ alone do the translation, explicitly.

> But here's one more example I just recalled: type
> "M-x getenv RET PATH RET" and look at the value.  I'm sure if I think
> more, I will recall more examples.  But why waste energy on a problem
> that doesn't exist?

To convert a PATH you need only iterate wrt `path-separator', calling
`standard-file-name' on each path component.  Again, doing that makes it
explicit at that point in the code that what is being handled is a list of file
names that are not necessarily in standard form.

We can agree to disagree - I don't think we're going to convince each other.
Please leave the bug report open, for possible consideration in the future or by
others.





This bug report was last modified 14 years and 234 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.