Package: emacs;
Reported by: "Drew Adams" <drew.adams <at> oracle.com>
Date: Mon, 4 Oct 2010 17:59:02 UTC
Severity: minor
Tags: wontfix
Found in version 24.0.50
Done: "Drew Adams" <drew.adams <at> oracle.com>
Bug is archived. No further changes may be made.
Message #17 received at 7159-done <at> debbugs.gnu.org (full text, mbox):
From: "Drew Adams" <drew.adams <at> oracle.com> To: "'Eli Zaretskii'" <eliz <at> gnu.org> Cc: 7159-done <at> debbugs.gnu.org Subject: RE: bug#7159: 24.0.50; (1) `file-name-(non)directory': bad return values, (2) `directory-sep-char' Date: Mon, 4 Oct 2010 15:22:26 -0700
> > (file-name-directory titi) ; gives "c:/foo/bar/b[^^@]*\\.el\\" > > (file-name-nondirectory titi) ; gives "'" > > > > These functions should know how to parse titi to produce > > "c:/foo/bar/" and "b[^^@]*\\.el\\'", respectively (where ^@ > > is the control char). > > You are forgetting the backslashes that wildcard-to-regexp inserted. It should be obvious that I am NOT forgetting such backslashes. > On DOS and Windows, Emacs treats backslashes as directory separators, > as you'd expect. So "c:/foo/bar/b[^^@]*\\.el\\" looks like a leading > directory of a file whose basename is "'". No. Well, let me put it another way: That is just what this bug report is about: Backslashes are NOT directory separators for Emacs - or at least they should not be. Even on Windows. This bug report says we should get rid of any such vestigial treatment. As the doc string for `directory-sep-char' indicates, ?/ is the only directory separator for Emacs - or at least it should be. "Directory separator character for built-in functions that return file names. The value is always ?/." It says "return" rather than "accept or return" because such functions don't yet DTRT wrt input. (And it says "built-in" rather than "standard", which would be better.) > In other words, don't pass a regexp with backslashes to these > functions, because you won't get what you think you will. Correction: You won't get what you should get, which is just the directory or non-directory portion of the name, respecting ?/ as the only separator. And it's not just about regexps - I used that as an example of a name that included a backslash. The point was that file-name decomposition functions should pay no attention to backslashes. There is no reason they should consider ?\ to be a directory separator. After fixing this we will also be able to remove this parenthetical phrase in the Elisp manual: "(backslash is also allowed in input on MS-DOS or MS-Windows)". This is the _only_ (whispered, parenthetical) mention of such a vestigial crutch. > > So I suspect that the `file-name-nondirectory' part of this bug > > is at least in part a Windows problem. The code seems to be > > interpreting the backslash (?\) near the end as a directory > > separator. > > It does, by design. Bad design, if so. More likely it is a vestige. Perhaps it seemed like the best or the only possible thing to do at the time, but it is not TRT. > > If so, that is definitely wrong. Even on Windows, the > > code should use the value of `directory-sep-char', which is ?/, > > not ?\. > > On Windows, we support both, and we always will. Anything else means > a terrible breakage, believe me. For example, it would be very hard > to parse output of programs that emit file name with backslashes. Parsing output of programs is something altogether different. You should not throw that in here. Emacs standard functions for decomposing file names should not be tainted with a eye to parsing arbitrary Windows program output. That is a completely different requirement and should be handled, naturally, by special-purpose code (i.e. at a different level) - code that knows just what to expect from those particular programs. We can have code in Emacs that parses many different kinds of output, including Windows file names. But the need for such special-purpose parsing code is unrelated to general, standard functions that expect a file name. In Emacs, such functions should not treat backslashes as directory separators. There is no need for that. Why? Because ?/ as dir separator works fine for Emacs code even in Windows. And because ?/ works always, we should use ONLY ?/. What is the real requirement to support also ?\? Please don't say that it is handling the output from some Windows programs - that is a red herring. Note that this is very different from the path-separator (":" for Windows, ";" for UNIX). In that case, ";" does NOT work for Emacs on Windows - there is no canonical separator. But for directories, ?/ _always works_, and it should therefore be the only char recognized as a dir separator. For general file-name functions, that is. Nothing prevents some specialized Windows parsing code from processing Windows file names that use ?\ (e.g. creating a file name that uses the standard separator, which can then be handled in the standard way). > With the current setup, this is seamless, Well, it's apparently been hard-coded here and there to such an extent that you are screaming that there would be a lot to change to clean it up. That in itself is a hefty price for such "seamlessness". But the real price is the loss of simple standard functions for manipulating file names correctly. By pushing special-purpose parsing into the code everywhere you might think things have been made "seamless", but in fact a muddy mess has been created. Emacs's handling of \? in a file name output by an external program should proceed in two stages: (1) translation to an Emacs file name (if needed), which means using ?/ as separator, then (2) handling of the Emacs file name using the standard file-name functions (e.g. `file-name-directory'). That's the clean way to handle such special-casing. (And any such use of special-case parsing should be the exception, not the rule.) > even if the file names use mixed forward- and back-slashes (yes, it > happens with GCC and GDB, for example, or even with Make sometimes). Again, there is nothing wrong with having specialized code that handles such cases on an individual basis, if they require it. But the general file-name handling code of Emacs should handle _Emacs_ file names, which use only ?/ as the dir separator. You are muddying the waters by throwing in lots of other stuff here. Of _course_ it can happen that some program might need to parse special syntax - any special syntax. But this is about the normal Emacs syntax for file names. And for that syntax the Emacs directory separator is ?/. If some particular Emacs code is forced by some other code (e.g. GDB) to digest a name that uses both ?/ and ?\ as directory separators (quelle horreur), then appropriate Emacs code can be used to fix such names before Emacs tries to deal with them using the standard file-name functions (e.g. `file-name-directory'). IOW, tack a translation mapping onto the output of GDB or Make or whatever to standardize such bastard file names (w/ mixed separators). That can be done by Emacs, but we should not foul the standard Emacs file-name handling with such considerations. "Seamless", indeed. Putting special-case handling throughout the code doesn't make things seamless; it makes them quite seamy. > > However, I see from the doc string that `directory-sep-char' has > > been made obsolete: > > In fact, just yesterday it was removed altogether, because it has not > effect on what Emacs does. That's been like that for years, and we > saw no complains. The complaint/suggestion wrt `directory-sep-char' is only that it should be a constant. We should not be advising people to hard-code ?/, but rather to use a constant with a name that proclaims what it is and with a value of ?/. But this is only a minor, stylistic concern. It is not directly related to this bug. > I'm closing this bug. I'm reopening it. To me, this is broken, and this dysfunction is not an inevitable price to be paid because GDB or whatever outputs Windows file names using backslashes. That argument is a copout. The simple functions `file-name-directory' and `file-name-nondirectory' should be robust enough to just remove the non-directory and directory portion - always. That should be so irregardless of the presence of backslashes. Those functions are broken on Windows when backslashes are present. If you don't want to fix this bug, fine; maybe someone else will someday. Maybe you don't want to make the effort required to remove such ad-hoc backslash handling here and there from the Windows Emacs code, but maybe someone else will someday. I believe you that the effort might be great, and I accept that therefore this cannot be a high priority now (there are _many_ outstanding bugs). But that does not mean that we currently handle Windows file names correctly. That we choose not to fix something now does not imply that it doesn't need fixing. Your reason, "Anything else means a terrible breakage, believe me" suggests that the fix is non-trivial because there is (apparently) lots of code here and there that still special-cases backslashes, on Windows. Your example of such breakage, "to parse output of programs that emit file name with backslashes", suggests that you do not distinguish parsing Windows program output from Emacs's general-purpose file-name handling functions. It is not right to mess up general-purpose file-name-handling functions just for the benefit of some special-purpose Windows-output parsing here and there. Write Windows-output specific code to do that according to the particular case (need), and make the general-purpose file-name handling functions do as they logically should: recognize ?/ as the only directory separator.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.