GNU bug report logs - #29832
ls: bug in 'ls -FQ': incorrectly quoted characters

Previous Next

Package: coreutils;

Reported by: hackerb9 <at> member.fsf.org

Date: Sun, 24 Dec 2017 07:15:01 UTC

Severity: normal

To reply to this bug, email your comments to 29832 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#29832; Package coreutils. (Sun, 24 Dec 2017 07:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to hackerb9 <at> member.fsf.org:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sun, 24 Dec 2017 07:15:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: hackerb9 <at> member.fsf.org
To: bug-coreutils <at> gnu.org
Subject: Bug in 'ls -FQ': incorrectly quoted characters
Date: Sat, 23 Dec 2017 23:14:28 -0800
[Message part 1 (text/plain, inline)]
Hello!

Thank you for your continued improvements and maintenance of 'ls' in
coreutils-8.28.

I have found a somewhat subtle bug. I've included a patch and a test script
to make sure there are no regressions.

In short, the problem is that a backlash is incorrectly put in front of the
characters @, =, >, and |. This only happens when using filetype indicators
and C style quoting.

  $ touch @
  $ ls
  @
  $ ls -F
  @
  $ ls -Q
  "@"
  $ ls -FQ
  "\@"        # <-- incorrect

There should be no backslash before the @.

This was a little bit tricky to track down as the code responsible is not
commented and appears to have been prematurely optimized by the programmer:

  ls.c:2141
  if (file_type <= indicator_style)
    {
      char const *p;
      for (p = &"*=>@|"[indicator_style - file_type]; *p; p++)
        set_char_quoting (filename_quoting_options, *p, 1);
    }

Basically, it says, if --file-type was used, then we should be sure to
quote '=', '>', and '@'. If --classify is used, then we should do the same
but also add in '*'. Or, to put it in less compressed C:

  if (indicator_style == file_type)
    {
      char const *p;
      for (p = "=>@|"; *p; p++)
        set_char_quoting (filename_quoting_options, *p, 1);
    }
  if (indicator_style == classify)
    {
      char const *p;
      for (p = "*=>@|"; *p; p++)
        set_char_quoting (filename_quoting_options, *p, 1);
    }

However, there is no need to quote those characters when the quoting-style
is set to C. In fact, it is an error to do so as these are not valid C
escape sequences.

I am guessing the idea of the code was that those characters would not be
backslash escaped, but rather the entire filename would be wrapped in
quotes so the indicator character cannot be confused for part of the
filename. And, indeed, this does work for some of the quoting styles:
"shell[-escape][-always]" work that way. Strangely, "c-maybe" also works
that way.

  $ ls -F --quoting-style=literal
  @
  $ ls -F --quoting-style=shell
  '@'
  $ ls -F --quoting-style=shell-always
  '@'
  $ ls -F --quoting-style=shell-escape
  '@'
  $ ls -F --quoting-style=shell-escape-always
  '@'
  $ ls -F --quoting-style=c-maybe
  "@"        # <-- This is correct!

I spent some time debugging it, but I didn't find the perfect fix as I
don't understand why it was written the way it was. Rather, I found
something that I think is an improvement and a step towards doing things
right. I've made a patch that simply checks if the quoting style is C and
skips the problematic lines. Ideally, somebody who understands why this
code was written this way in the first place can make a better patch, but
this should work for now and not break anything for anybody else.

Please find attached below, my patch for src/ls.c and a regression test
script to be placed in tests/ls/quoting-style-c.sh.

--b9

P.S. This bug may also apply to the -b (--quoting-style=escape). However,
the output from it appears to be more like a mish-mash of C and shell
escapes. (For example, tabs are shown as "\t", as in C, but spaces are
shown as "\ ", as in bourne shell. ) The -Q option, on the other hand,
generates perfectly acceptable C strings (tested with gcc and python),
except for those few buggy characters fixed in this patch.
[Message part 2 (text/html, inline)]
[ls-FQ.patch (text/x-patch, attachment)]
[quoting-style-c.sh (application/x-sh, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#29832; Package coreutils. (Sun, 24 Dec 2017 12:27:02 GMT) Full text and rfc822 format available.

Message #8 received at 29832 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: hackerb9 <at> member.fsf.org, 29832 <at> debbugs.gnu.org
Subject: Re: bug#29832: Bug in 'ls -FQ': incorrectly quoted characters
Date: Sun, 24 Dec 2017 12:26:30 +0000
On 24/12/17 07:14, hackerb9 <at> member.fsf.org wrote:
> Hello!
> 
> Thank you for your continued improvements and maintenance of 'ls' in
> coreutils-8.28.
> 
> I have found a somewhat subtle bug. I've included a patch and a test script
> to make sure there are no regressions.
> 
> In short, the problem is that a backlash is incorrectly put in front of the
> characters @, =, >, and |. This only happens when using filetype indicators
> and C style quoting.
> 
>   $ touch @
>   $ ls
>   @
>   $ ls -F
>   @
>   $ ls -Q
>   "@"
>   $ ls -FQ
>   "\@"        # <-- incorrect
> 
> There should be no backslash before the @.
> 
> This was a little bit tricky to track down as the code responsible is not
> commented and appears to have been prematurely optimized by the programmer:
> 
>   ls.c:2141
>   if (file_type <= indicator_style)
>     {
>       char const *p;
>       for (p = &"*=>@|"[indicator_style - file_type]; *p; p++)
>         set_char_quoting (filename_quoting_options, *p, 1);
>     }
> 
> Basically, it says, if --file-type was used, then we should be sure to
> quote '=', '>', and '@'. If --classify is used, then we should do the same
> but also add in '*'. Or, to put it in less compressed C:
> 
>   if (indicator_style == file_type)
>     {
>       char const *p;
>       for (p = "=>@|"; *p; p++)
>         set_char_quoting (filename_quoting_options, *p, 1);
>     }
>   if (indicator_style == classify)
>     {
>       char const *p;
>       for (p = "*=>@|"; *p; p++)
>         set_char_quoting (filename_quoting_options, *p, 1);
>     }
> 
> However, there is no need to quote those characters when the quoting-style
> is set to C. In fact, it is an error to do so as these are not valid C
> escape sequences.
> 
> I am guessing the idea of the code was that those characters would not be
> backslash escaped, but rather the entire filename would be wrapped in
> quotes so the indicator character cannot be confused for part of the
> filename. And, indeed, this does work for some of the quoting styles:
> "shell[-escape][-always]" work that way. Strangely, "c-maybe" also works
> that way.
> 
>   $ ls -F --quoting-style=literal
>   @
>   $ ls -F --quoting-style=shell
>   '@'
>   $ ls -F --quoting-style=shell-always
>   '@'
>   $ ls -F --quoting-style=shell-escape
>   '@'
>   $ ls -F --quoting-style=shell-escape-always
>   '@'
>   $ ls -F --quoting-style=c-maybe
>   "@"        # <-- This is correct!
> 
> I spent some time debugging it, but I didn't find the perfect fix as I
> don't understand why it was written the way it was. Rather, I found
> something that I think is an improvement and a step towards doing things
> right. I've made a patch that simply checks if the quoting style is C and
> skips the problematic lines. Ideally, somebody who understands why this
> code was written this way in the first place can make a better patch, but
> this should work for now and not break anything for anybody else.
> 
> Please find attached below, my patch for src/ls.c and a regression test
> script to be placed in tests/ls/quoting-style-c.sh.
> 
> --b9
> 
> P.S. This bug may also apply to the -b (--quoting-style=escape). However,
> the output from it appears to be more like a mish-mash of C and shell
> escapes. (For example, tabs are shown as "\t", as in C, but spaces are
> shown as "\ ", as in bourne shell. ) The -Q option, on the other hand,
> generates perfectly acceptable C strings (tested with gcc and python),
> except for those few buggy characters fixed in this patch.

Thanks for the detailed analysis.

c-maybe is correct here as quotearg explicitly ignores these extra chars
to be quoted when outer quotes are added.  Therefore we should
probably ignore these extra chars to be quoted when we're adding
outer quotes unconditionally.

This gnulib patch would handle this specific case.
I'll look into generalizing for all outer quoted cases:

diff --git a/lib/quotearg.c b/lib/quotearg.c
index 8e432e1..47ffc5c 100644
--- a/lib/quotearg.c
+++ b/lib/quotearg.c
@@ -709,7 +709,9 @@ quotearg_buffer_restyled (char *buffer, size_t buffersize,
           }
         }

-      if (! (((backslash_escapes && quoting_style != shell_always_quoting_style)
+      if (! (((backslash_escapes
+               && quoting_style != shell_always_quoting_style
+               && quoting_style != c_quoting_style)
               || elide_outer_quotes)
              && quote_these_too
              && quote_these_too[c / INT_BITS] >> (c % INT_BITS) & 1)

cheers,
Pádraig




Changed bug title to 'ls: bug in 'ls -FQ': incorrectly quoted characters' from 'Bug in 'ls -FQ': incorrectly quoted characters' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 02:23:02 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 233 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.