GNU bug report logs - #21702
shell-quote-argument semantics and safety

Package: emacs;

Reported by: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)

Date: Sun, 18 Oct 2015 12:37:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 21702 in the body.
You can then email your comments to 21702 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Sun, 18 Oct 2015 12:37:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer):
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 18 Oct 2015 12:37:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
To: bug-gnu-emacs <at> gnu.org
Subject: shell-quote-argument semantics and safety
Date: Sun, 18 Oct 2015 14:36:03 +0200

[Message part 1 (text/plain, inline)]

The documentation of shell-quote-argument only says

    Quote ARGUMENT for passing as argument to an inferior shell.

It's unclear for which shells this is supposed to work.  In a recent
thread in emacs-devel, it has been demonstrated that if the result is
passed to csh, it can allow an attacker to execute an arbitrary shell
command, although without arguments:

    (let ((argument (read untrusted-source)))
      (assert (stringp argument))
      (call-process "csh" nil t nil "-c"
                    (concat "echo " (shell-quote-argument argument))))

    ;; If untrusted-source gives us "\nevil-command\n", we get:
    ;; evil-command: Command not found.

The function should clearly document

    1) for which shells will the quoting work absolutely, i.e. lead to
    the given string to appear *verbatim* in an element of the ARGV of
    the called command,

    2) optionally, for which shells will the quoting at least prevent
    code injection,

    3) optionally, for which shells and character sets for ARGUMENT will
    the quoting work absolutely,

    4) optionally, for which shells and character sets for ARGUMENT will
    the quoting at least prevent code injection,

    5) optionally, for which shells will the quoting work at all even if
    it provides no clear semantics, such that one can at least use it
    with data coming from trusted sources (e.g. other parts of Emacs's
    source code, or the user sitting in front of Emacs), where it's the
    user's/programmer's responsibility to stick to values for ARGUMENT
    that are intuitively known to be unproblematic even if the character
    set isn't well-defined.

Currently #5 seems to be implied for all shells, for lack of further
documentation.  Possibly, the function was never meant to be used with
untrusted data, but there's no warning against doing so either.

I stress-tested the strategy it uses for POSIX shells with the following
horrible hack; the results are positive, i.e. the strategy seems to meet
the criteria #1 above for POSIX shells.

for i in {0..999}
do
    dd if=/dev/urandom of=/dev/stdout bs=1K count=1 2>/dev/null |
        tr -d '\000' > randomfile  # NULL bytes in ARGV are impossible

    emacs -q --batch --eval \
          "(with-temp-buffer 
             (insert-file-contents-literally \"randomfile\")
             (let ((data
                    (replace-regexp-in-string
                      \"\\n\" \"'\\n'\"
                      (replace-regexp-in-string 
                        \"[^-0-9a-zA-Z_./\\n]\" 
                        \"\\\\\\\\\\\\&\" 
                        (buffer-substring (point-min) (point-max))))))
               (erase-buffer)
               (insert \"printf %s \")
               (insert data)
               (write-region (point-min) (point-max) \"commandfile\")))"

    sh - < commandfile > output  # tested with bash, dash, and ksh

    diff randomfile output || exit
done

There's also wording in POSIX which seems to guarantee the safety of the
strategy:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02_01

"A <backslash> that is not quoted shall preserve the literal value of
the following character, with the exception of a <newline>. [...]"


For now, here's a trivial patch improving the docstring.  If anyone is
confident in the safety of the function for shells other than those
conforming to POSIX sh, feel free to change the docstring accordingly.

[0001-lisp-subr.el-shell-quote-argument-Improve-documentat.patch (text/x-diff, inline)]

From dedcb603da981dcab8f576dea2f36d58fd2ddcfa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Taylan=20Ulrich=20Bay=C4=B1rl=C4=B1/Kammer?=
 <taylanbayirli <at> gmail.com>
Date: Sun, 18 Oct 2015 14:23:35 +0200
Subject: [PATCH] * lisp/subr.el (shell-quote-argument): Improve documentation.

---
 lisp/subr.el | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lisp/subr.el b/lisp/subr.el
index e176907..940ebe6 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -2711,7 +2711,11 @@ Note: :data and :device are currently not supported on Windows."
 (declare-function w32-shell-dos-semantics "w32-fns" nil)
 
 (defun shell-quote-argument (argument)
-  "Quote ARGUMENT for passing as argument to an inferior shell."
+  "Quote ARGUMENT for passing as argument to an inferior shell.
+
+This is safe for shells conforming to POSIX sh.  No guarantees
+regarding code injection are made for other shells, but csh,
+MS-DOS and Windows NT are supported for simple cases as well."
   (cond
    ((eq system-type 'ms-dos)
     ;; Quote using double quotes, but escape any existing quotes in
-- 
2.5.0

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Sun, 18 Oct 2015 15:27:02 GMT) Full text and rfc822 format available.

Message #8 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
To: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Sun, 18 Oct 2015 17:26:19 +0200

On the development list it has been pointed out that the Info manual
contains more verbose documentation on this function, although it
doesn't clarify the semantics much either.

=== snip ===
Lisp programs sometimes need to run a shell and give it a command that
contains file names that were specified by the user.  These programs
ought to be able to support any valid file name.  But the shell gives
special treatment to certain characters, and if these characters occur
in the file name, they will confuse the shell.  To handle these
characters, use the function ‘shell-quote-argument’:

 -- Function: shell-quote-argument argument
     This function returns a string that represents, in shell syntax, an
     argument whose actual contents are ARGUMENT.  It should work
     reliably to concatenate the return value into a shell command and
     then pass it to a shell for execution.

     Precisely what this function does depends on your operating system.
     The function is designed to work with the syntax of your system’s
     standard shell; if you use an unusual shell, you will need to
     redefine this function.

          ;; This example shows the behavior on GNU and Unix systems.
          (shell-quote-argument "foo > bar")
               ⇒ "foo\\ \\>\\ bar"

          ;; This example shows the behavior on MS-DOS and MS-Windows.
          (shell-quote-argument "foo > bar")
               ⇒ "\"foo > bar\""

     Here’s an example of using ‘shell-quote-argument’ to construct a
     shell command:

          (concat "diff -c "
                  (shell-quote-argument oldfile)
                  " "
                  (shell-quote-argument newfile))
=== /snip ===

I'm not sure if that needs change, given the change to the docstring,
which counts as the more authoritative documentation of the precise
semantics if I'm not mistaken.

Taylan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Sun, 18 Oct 2015 17:17:01 GMT) Full text and rfc822 format available.

Message #11 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Sun, 18 Oct 2015 20:16:54 +0300

> From: taylanbayirli <at> gmail.com (Taylan Ulrich
> 	Bayırlı/Kammer)
> Date: Sun, 18 Oct 2015 14:36:03 +0200
> 
> The documentation of shell-quote-argument only says
> 
>     Quote ARGUMENT for passing as argument to an inferior shell.
> 
> It's unclear for which shells this is supposed to work.

I fixed the doc string to clarify that this function works correctly
with the system's standard shell.

> In a recent thread in emacs-devel, it has been demonstrated that if
> the result is passed to csh, it can allow an attacker to execute an
> arbitrary shell command

As I understand it, csh is not the standard shell on Posix systems, so
the fixed doc string now says not to expect it to work with csh.

> The function should clearly document
> 
>     1) for which shells will the quoting work absolutely, i.e. lead to
>     the given string to appear *verbatim* in an element of the ARGV of
>     the called command,
> 
>     2) optionally, for which shells will the quoting at least prevent
>     code injection,
> 
>     3) optionally, for which shells and character sets for ARGUMENT will
>     the quoting work absolutely,
> 
>     4) optionally, for which shells and character sets for ARGUMENT will
>     the quoting at least prevent code injection,
> 
>     5) optionally, for which shells will the quoting work at all even if
>     it provides no clear semantics, such that one can at least use it
>     with data coming from trusted sources (e.g. other parts of Emacs's
>     source code, or the user sitting in front of Emacs), where it's the
>     user's/programmer's responsibility to stick to values for ARGUMENT
>     that are intuitively known to be unproblematic even if the character
>     set isn't well-defined.
> 
> Currently #5 seems to be implied for all shells, for lack of further
> documentation.  Possibly, the function was never meant to be used with
> untrusted data, but there's no warning against doing so either.

I thin 1) is now covered, and the rest are optional.  In particular,
our way to provide better safety is not by documenting APIs that could
be maliciously abused, but by marking the related variables as unsafe
unless they have special predefined values.  So I don't think we
should extend this particular doc string with safety information.

>  (defun shell-quote-argument (argument)
> -  "Quote ARGUMENT for passing as argument to an inferior shell."
> +  "Quote ARGUMENT for passing as argument to an inferior shell.
> +
> +This is safe for shells conforming to POSIX sh.  No guarantees
> +regarding code injection are made for other shells, but csh,
> +MS-DOS and Windows NT are supported for simple cases as well."

Thanks, but I think this is no longer needed, after the change I made.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Sun, 18 Oct 2015 19:13:02 GMT) Full text and rfc822 format available.

Message #14 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Sun, 18 Oct 2015 21:12:34 +0200

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: taylanbayirli <at> gmail.com (Taylan Ulrich
>> 	Bayırlı/Kammer)
>> Date: Sun, 18 Oct 2015 14:36:03 +0200
>> 
>> The documentation of shell-quote-argument only says
>> 
>>     Quote ARGUMENT for passing as argument to an inferior shell.
>> 
>> It's unclear for which shells this is supposed to work.
>
> I fixed the doc string to clarify that this function works correctly
> with the system's standard shell.
>
>> In a recent thread in emacs-devel, it has been demonstrated that if
>> the result is passed to csh, it can allow an attacker to execute an
>> arbitrary shell command
>
> As I understand it, csh is not the standard shell on Posix systems, so
> the fixed doc string now says not to expect it to work with csh.
>
>> The function should clearly document
>> 
>>     1) for which shells will the quoting work absolutely, i.e. lead to
>>     the given string to appear *verbatim* in an element of the ARGV of
>>     the called command,
>> 
>>     2) optionally, for which shells will the quoting at least prevent
>>     code injection,
>> 
>>     3) optionally, for which shells and character sets for ARGUMENT will
>>     the quoting work absolutely,
>> 
>>     4) optionally, for which shells and character sets for ARGUMENT will
>>     the quoting at least prevent code injection,
>> 
>>     5) optionally, for which shells will the quoting work at all even if
>>     it provides no clear semantics, such that one can at least use it
>>     with data coming from trusted sources (e.g. other parts of Emacs's
>>     source code, or the user sitting in front of Emacs), where it's the
>>     user's/programmer's responsibility to stick to values for ARGUMENT
>>     that are intuitively known to be unproblematic even if the character
>>     set isn't well-defined.
>> 
>> Currently #5 seems to be implied for all shells, for lack of further
>> documentation.  Possibly, the function was never meant to be used with
>> untrusted data, but there's no warning against doing so either.
>
> I thin 1) is now covered, and the rest are optional.  In particular,
> our way to provide better safety is not by documenting APIs that could
> be maliciously abused, but by marking the related variables as unsafe
> unless they have special predefined values.  So I don't think we
> should extend this particular doc string with safety information.

Hm, there seems to be a fundamental difference in mindset here in how
one might use Elisp.

I'd like to point out that (in the most extreme cases) people have
actually been writing web servers and other such programs in Elisp for
which one would normally use a general-purpose language.

That is, "APIs that could be maliciously abused" is not the right way to
look at it.  It's not about the Elisp programmer abusing the API, it's
about a malicious data source exploiting a (potential) flaw in an Elisp
function, which Elisp programmers have relied on and thus made their
programs vulnerable to code injection.

That's why I was being so careful with regard to the safety guarantees
of the "shell-quasiquote" package I contributed.  I would like people to
be able to use that as part of a general-purpose Elisp language, and so
being safe against code injection is an absolute must.  They might after
all use it as part of a network-facing service.

Actually that might also apply when using e.g. TRAMP, which also
communicates with remote hosts and is a normal part of Emacs.  I've been
told it receives file names from remote hosts and passes them through
shell-quote-argument before giving them to a shell.  So maybe my
concerns apply there as well.

Given that, "I think 1) is now covered" is not very relieving to hear.
It amounts to "I think this is safe against code injection" which is
rather alarming to hear.  Either it's very confidently known to be safe
and so one may use it for network-facing code, or it's not confidently
known to be safe and so one shouldn't use it for network-facing code.
This should be documented clearly especially so that users who aren't
very aware of injection attacks won't nonchalantly use the function for
their network-facing code (when the function isn't known to be safe for
this), but also so that users who are aware of such issues know they can
use the function and don't instead invent their own thing (when it is
known to be safe).

Does that make sense?

Taylan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Sun, 18 Oct 2015 19:49:01 GMT) Full text and rfc822 format available.

Message #17 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Sun, 18 Oct 2015 22:48:15 +0300

> From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
> Cc: 21702 <at> debbugs.gnu.org
> Date: Sun, 18 Oct 2015 21:12:34 +0200
> 
> I'd like to point out that (in the most extreme cases) people have
> actually been writing web servers and other such programs in Elisp for
> which one would normally use a general-purpose language.
> 
> That is, "APIs that could be maliciously abused" is not the right way to
> look at it.  It's not about the Elisp programmer abusing the API, it's
> about a malicious data source exploiting a (potential) flaw in an Elisp
> function, which Elisp programmers have relied on and thus made their
> programs vulnerable to code injection.
> 
> 
> That's why I was being so careful with regard to the safety guarantees
> of the "shell-quasiquote" package I contributed.  I would like people to
> be able to use that as part of a general-purpose Elisp language, and so
> being safe against code injection is an absolute must.  They might after
> all use it as part of a network-facing service.
> 
> 
> Actually that might also apply when using e.g. TRAMP, which also
> communicates with remote hosts and is a normal part of Emacs.  I've been
> told it receives file names from remote hosts and passes them through
> shell-quote-argument before giving them to a shell.  So maybe my
> concerns apply there as well.
> 
> 
> Given that, "I think 1) is now covered" is not very relieving to hear.

Item 1 was this:

> >> The function should clearly document
> >> 
> >>     1) for which shells will the quoting work absolutely, i.e. lead to
> >>     the given string to appear *verbatim* in an element of the ARGV of
> >>     the called command,

There's nothing about safety here, only about correctness.  That is
the aspect that I think is now covered, as the doc string now says for
which shells one can have correct results.

> It amounts to "I think this is safe against code injection" which is
> rather alarming to hear.  Either it's very confidently known to be safe
> and so one may use it for network-facing code, or it's not confidently
> known to be safe and so one shouldn't use it for network-facing code.
> This should be documented clearly especially so that users who aren't
> very aware of injection attacks won't nonchalantly use the function for
> their network-facing code (when the function isn't known to be safe for
> this), but also so that users who are aware of such issues know they can
> use the function and don't instead invent their own thing (when it is
> known to be safe).
> 
> Does that make sense?

Maybe it does, but only if we start documenting these aspects
project-wide.  It makes little sense to me to do that for a single
API, and not an important one at that.  But that's me.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Mon, 19 Oct 2015 07:35:02 GMT) Full text and rfc822 format available.

Message #20 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Mon, 19 Oct 2015 09:34:15 +0200

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
>> Cc: 21702 <at> debbugs.gnu.org
>> Date: Sun, 18 Oct 2015 21:12:34 +0200
>> 
>> I'd like to point out that (in the most extreme cases) people have
>> actually been writing web servers and other such programs in Elisp for
>> which one would normally use a general-purpose language.
>> 
>> That is, "APIs that could be maliciously abused" is not the right way to
>> look at it.  It's not about the Elisp programmer abusing the API, it's
>> about a malicious data source exploiting a (potential) flaw in an Elisp
>> function, which Elisp programmers have relied on and thus made their
>> programs vulnerable to code injection.
>> 
>> 
>> That's why I was being so careful with regard to the safety guarantees
>> of the "shell-quasiquote" package I contributed.  I would like people to
>> be able to use that as part of a general-purpose Elisp language, and so
>> being safe against code injection is an absolute must.  They might after
>> all use it as part of a network-facing service.
>> 
>> 
>> Actually that might also apply when using e.g. TRAMP, which also
>> communicates with remote hosts and is a normal part of Emacs.  I've been
>> told it receives file names from remote hosts and passes them through
>> shell-quote-argument before giving them to a shell.  So maybe my
>> concerns apply there as well.
>> 
>> 
>> Given that, "I think 1) is now covered" is not very relieving to hear.
>
> Item 1 was this:
>
>> >> The function should clearly document
>> >> 
>> >>     1) for which shells will the quoting work absolutely, i.e. lead to
>> >>     the given string to appear *verbatim* in an element of the ARGV of
>> >>     the called command,
>
> There's nothing about safety here, only about correctness.  That is
> the aspect that I think is now covered, as the doc string now says for
> which shells one can have correct results.

Usually it's indeed correctness that protects against injection attacks.
A quoting mechanism that's correct is automatically safe.  Another way
to make it safe would be to error when the given string contains
characters outside of a limited character set.

Either way, the safeness should be documented clearly, either implicitly
through a clear documentation of the correctness, or explicitly.

In your patch, correctness is implied, but the complexity of the problem
domain (and thus the function itself) and the importance of possible
repercussions of an incorrect implementation leave clearer documentation
to be desired.  While any function is really implied to be correct by
its existence, any function is also implied to very possibly contain
bugs, as is natural for software.  In many cases these bugs are
unimportant.  In this case not.

I would propose something along the lines of:

    It is guaranteed that ARGUMENT will be parsed as a single token by
    shells X, Y, and Z, as long as it is separated from other text via a
    delimiter in the syntax of the respective shell.

(That's even better than the patch I proposed, which didn't mention the
problem of delimiting.)

I think it's also important to provide some explicit enumeration of
shells for which the function is safe, because the systems Emacs
supports may change over time, and there is no guarantee that a change
to this function will not entail bugs.  If we add wording like the
above, then any programmer who sits down to expand the function's
semantics to another shell will be forced to think very hard about what
they're doing; otherwise they might try to do a "good enough" job but
not make sure that all edge-cases are handled.  "Designed to work with"
is after all not an absolute claim of correctness and absence of bugs.

>> It amounts to "I think this is safe against code injection" which is
>> rather alarming to hear.  Either it's very confidently known to be safe
>> and so one may use it for network-facing code, or it's not confidently
>> known to be safe and so one shouldn't use it for network-facing code.
>> This should be documented clearly especially so that users who aren't
>> very aware of injection attacks won't nonchalantly use the function for
>> their network-facing code (when the function isn't known to be safe for
>> this), but also so that users who are aware of such issues know they can
>> use the function and don't instead invent their own thing (when it is
>> known to be safe).
>> 
>> Does that make sense?
>
> Maybe it does, but only if we start documenting these aspects
> project-wide.  It makes little sense to me to do that for a single
> API, and not an important one at that.  But that's me.

This is an API which if its implementation is imperfect will result in
programs prone to code injection attacks when these programs face
untrusted input sources.  Why do you say it's not an important one?

Taylan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Mon, 19 Oct 2015 07:49:02 GMT) Full text and rfc822 format available.

Message #23 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Mon, 19 Oct 2015 10:47:28 +0300

> From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
> Cc: 21702 <at> debbugs.gnu.org
> Date: Mon, 19 Oct 2015 09:34:15 +0200
> 
> > Item 1 was this:
> >
> >> >> The function should clearly document
> >> >> 
> >> >>     1) for which shells will the quoting work absolutely, i.e. lead to
> >> >>     the given string to appear *verbatim* in an element of the ARGV of
> >> >>     the called command,
> >
> > There's nothing about safety here, only about correctness.  That is
> > the aspect that I think is now covered, as the doc string now says for
> > which shells one can have correct results.
> 
> Usually it's indeed correctness that protects against injection attacks.
> A quoting mechanism that's correct is automatically safe.

And that is the current situation, AFAIU.

> Another way to make it safe would be to error when the given string
> contains characters outside of a limited character set.

What limited set would you suggest that will not make the function
useless in real-life scenarios?

In any case, I think quoting is better than rejecting, as it supports
more use cases.

> Either way, the safeness should be documented clearly, either implicitly
> through a clear documentation of the correctness, or explicitly.

Like I said, this convention should be adopted project-wide.  Doing so
only in a few doc strings, let alone one, will only confuse, because
the user will not know whether the lack of such documentation means
the API is safe or unsafe.

> I would propose something along the lines of:
> 
>     It is guaranteed that ARGUMENT will be parsed as a single token by
>     shells X, Y, and Z, as long as it is separated from other text via a
>     delimiter in the syntax of the respective shell.

I don't think we want to mention specific shells explicitly, because
maintaining such a list would be a burden.  The standard shell of each
OS is well defined and known to the users of the respective systems.
Moreover, Emacs by default uses that shell automatically.

> >> Does that make sense?
> >
> > Maybe it does, but only if we start documenting these aspects
> > project-wide.  It makes little sense to me to do that for a single
> > API, and not an important one at that.  But that's me.
> 
> This is an API which if its implementation is imperfect will result in
> programs prone to code injection attacks when these programs face
> untrusted input sources.  Why do you say it's not an important one?

Because there are many much more important ones that can do much more
harm more easily.  In particular, a shell command doesn't need to be
quoted to be harmful or malicious.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Mon, 19 Oct 2015 09:23:02 GMT) Full text and rfc822 format available.

Message #26 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Mon, 19 Oct 2015 11:22:16 +0200

[Message part 1 (text/plain, inline)]

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
>> Cc: 21702 <at> debbugs.gnu.org
>> Date: Mon, 19 Oct 2015 09:34:15 +0200
>> 
>> > Item 1 was this:
>> >
>> >> >> The function should clearly document
>> >> >> 
>> >> >>     1) for which shells will the quoting work absolutely, i.e. lead to
>> >> >>     the given string to appear *verbatim* in an element of the ARGV of
>> >> >>     the called command,
>> >
>> > There's nothing about safety here, only about correctness.  That is
>> > the aspect that I think is now covered, as the doc string now says for
>> > which shells one can have correct results.
>> 
>> Usually it's indeed correctness that protects against injection attacks.
>> A quoting mechanism that's correct is automatically safe.
>
> And that is the current situation, AFAIU.
>
>> Another way to make it safe would be to error when the given string
>> contains characters outside of a limited character set.
>
> What limited set would you suggest that will not make the function
> useless in real-life scenarios?
>
> In any case, I think quoting is better than rejecting, as it supports
> more use cases.
>
>> Either way, the safeness should be documented clearly, either implicitly
>> through a clear documentation of the correctness, or explicitly.
>
> Like I said, this convention should be adopted project-wide.  Doing so
> only in a few doc strings, let alone one, will only confuse, because
> the user will not know whether the lack of such documentation means
> the API is safe or unsafe.

Yes, it should be done for every function for which the concerns I've
explained apply.  So let's start from this one.

>> I would propose something along the lines of:
>> 
>>     It is guaranteed that ARGUMENT will be parsed as a single token by
>>     shells X, Y, and Z, as long as it is separated from other text via a
>>     delimiter in the syntax of the respective shell.
>
> I don't think we want to mention specific shells explicitly, because
> maintaining such a list would be a burden.  The standard shell of each
> OS is well defined and known to the users of the respective systems.
> Moreover, Emacs by default uses that shell automatically.

For instance: POSIX sh, MS-DOS, and Windows NT, is not a long list.  (I
don't really know what shells MS-DOS and Windows NT use; a more precise
naming would be good.)

The payoff of the small burden is having clear safety guarantees.

>> >> Does that make sense?
>> >
>> > Maybe it does, but only if we start documenting these aspects
>> > project-wide.  It makes little sense to me to do that for a single
>> > API, and not an important one at that.  But that's me.
>> 
>> This is an API which if its implementation is imperfect will result in
>> programs prone to code injection attacks when these programs face
>> untrusted input sources.  Why do you say it's not an important one?
>
> Because there are many much more important ones that can do much more
> harm more easily.  In particular, a shell command doesn't need to be
> quoted to be harmful or malicious.

There being other important cases, does not make this a less important
case.  It is exactly as important as I've already said.

I don't understand what "a shell command doesn't need to be quoted to be
harmful" is supposed to mean; quoting is what *makes* the arguments
harmless, by ensuring they cleanly end up in the ARGV of a called
command instead of causing arbitrary behavior of the shell.

Here's a patch doing an improvement to the documentation like the one I
proposed.  Of course, if you have verified that shells other than POSIX
sh are fully safe, feel free to improve the docstring accordingly.

Taylan

[0001-lisp-subr.el-shell-quote-argument-Improve-documentat.patch (text/x-diff, inline)]

From bb746be5638a17c99e1647ecc178e3b9d97e4ba3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Taylan=20Ulrich=20Bay=C4=B1rl=C4=B1/Kammer?=
 <taylanbayirli <at> gmail.com>
Date: Sun, 18 Oct 2015 14:23:35 +0200
Subject: [PATCH] * lisp/subr.el (shell-quote-argument): Improve documentation.

---
 lisp/subr.el | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/lisp/subr.el b/lisp/subr.el
index c903ee3..e55647b 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -2713,8 +2713,14 @@ Note: :data and :device are currently not supported on Windows."
 (defun shell-quote-argument (argument)
   "Quote ARGUMENT for passing as argument to an inferior shell.
 
-This function is designed to work with the syntax of your system's
-standard shell, and might produce incorrect results with unusual shells."
+This is safe for shells conforming to POSIX sh.  No safety
+guarantees are made for other shells, but the standard MS-DOS and
+Windows NT shells are supported as well.
+
+Being safe in this context means that as long as the result is
+surrounded by delimiters in the syntax of the respective shell,
+it's guaranteed that it will be parsed as one token and that the
+value of the token will be exactly ARGUMENT."
   (cond
    ((eq system-type 'ms-dos)
     ;; Quote using double quotes, but escape any existing quotes in
-- 
2.5.0

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Mon, 19 Oct 2015 09:33:02 GMT) Full text and rfc822 format available.

Message #29 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Mon, 19 Oct 2015 12:32:22 +0300

> From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
> Cc: 21702 <at> debbugs.gnu.org
> Date: Mon, 19 Oct 2015 11:22:16 +0200
> 
> > Like I said, this convention should be adopted project-wide.  Doing so
> > only in a few doc strings, let alone one, will only confuse, because
> > the user will not know whether the lack of such documentation means
> > the API is safe or unsafe.
> 
> Yes, it should be done for every function for which the concerns I've
> explained apply.  So let's start from this one.

Before we start, we need a _decision_ to do that everywhere.  Then we
could start doing that piecemeal.  Before the decision is made,
there's no reason to make any such changes.

> >> I would propose something along the lines of:
> >> 
> >>     It is guaranteed that ARGUMENT will be parsed as a single token by
> >>     shells X, Y, and Z, as long as it is separated from other text via a
> >>     delimiter in the syntax of the respective shell.
> >
> > I don't think we want to mention specific shells explicitly, because
> > maintaining such a list would be a burden.  The standard shell of each
> > OS is well defined and known to the users of the respective systems.
> > Moreover, Emacs by default uses that shell automatically.
> 
> For instance: POSIX sh, MS-DOS, and Windows NT, is not a long list.

This list doesn't name shells on DOS and Windows (there are several
good candidates).  As for Posix, is it only sh?  What about Bash? what
about zsh?

You see, the moment you come up with a list such as above, people will
start complaining that their favorite shell is not in the list, and
the list will grow.  Then we will discover that some shells are not
really compatible after all, etc. etc.  It's a maintenance burden we
had better avoided.

Saying "the standard shell" avoids all that nicely, because it refers
to a single well-known shell.

> I don't understand what "a shell command doesn't need to be quoted to be
> harmful" is supposed to mean

Something like this:

  rm -rf /*

> Here's a patch doing an improvement to the documentation like the one I
> proposed.  Of course, if you have verified that shells other than POSIX
> sh are fully safe, feel free to improve the docstring accordingly.

Thanks.  However, like I said, I don't think this change would be
correct, or needed.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Mon, 19 Oct 2015 09:51:02 GMT) Full text and rfc822 format available.

Message #32 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Mon, 19 Oct 2015 11:50:23 +0200

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
>> Cc: 21702 <at> debbugs.gnu.org
>> Date: Mon, 19 Oct 2015 11:22:16 +0200
>> 
>> > Like I said, this convention should be adopted project-wide.  Doing so
>> > only in a few doc strings, let alone one, will only confuse, because
>> > the user will not know whether the lack of such documentation means
>> > the API is safe or unsafe.
>> 
>> Yes, it should be done for every function for which the concerns I've
>> explained apply.  So let's start from this one.
>
> Before we start, we need a _decision_ to do that everywhere.  Then we
> could start doing that piecemeal.  Before the decision is made,
> there's no reason to make any such changes.

Given all the reasons I listed, I would expect that decision to be
obvious.

>> >> I would propose something along the lines of:
>> >> 
>> >>     It is guaranteed that ARGUMENT will be parsed as a single token by
>> >>     shells X, Y, and Z, as long as it is separated from other text via a
>> >>     delimiter in the syntax of the respective shell.
>> >
>> > I don't think we want to mention specific shells explicitly, because
>> > maintaining such a list would be a burden.  The standard shell of each
>> > OS is well defined and known to the users of the respective systems.
>> > Moreover, Emacs by default uses that shell automatically.
>> 
>> For instance: POSIX sh, MS-DOS, and Windows NT, is not a long list.
>
> This list doesn't name shells on DOS and Windows (there are several
> good candidates).  As for Posix, is it only sh?  What about Bash? what
> about zsh?
>
> You see, the moment you come up with a list such as above, people will
> start complaining that their favorite shell is not in the list, and
> the list will grow.  Then we will discover that some shells are not
> really compatible after all, etc. etc.  It's a maintenance burden we
> had better avoided.
>
> Saying "the standard shell" avoids all that nicely, because it refers
> to a single well-known shell.

Dash, Bash and (AFAIK all versions of) ksh are POSIX sh compliant.  Zsh
not unless when requested IIRC; in any case "POSIX sh" is well-defined.

My latest patch says "standard shells of MS-DOS and Windows NT."  Feel
free to improve that if necessary.

>> I don't understand what "a shell command doesn't need to be quoted to be
>> harmful" is supposed to mean
>
> Something like this:
>
>   rm -rf /*

What are you trying to say?  Of course an arbitrary shell command can do
anything.  The whole point of shell-quote-argument is to prevent a
string which is meant purely as an argument to a command to become
equivalent in power to an arbitrary shell command.

>> Here's a patch doing an improvement to the documentation like the one I
>> proposed.  Of course, if you have verified that shells other than POSIX
>> sh are fully safe, feel free to improve the docstring accordingly.
>
> Thanks.  However, like I said, I don't think this change would be
> correct, or needed.

I've explained the need for the change, and it is correct.

I don't understand why you're trying to make everything so difficult.
If for reasons unclear to me you absolutely refuse to accept these
improvements to shell-quote-argument's documentation, I will just
continue not using the function, because it cannot be trusted.

Taylan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Mon, 19 Oct 2015 10:20:02 GMT) Full text and rfc822 format available.

Message #35 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Mon, 19 Oct 2015 13:19:26 +0300

> From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
> Cc: 21702 <at> debbugs.gnu.org
> Date: Mon, 19 Oct 2015 11:50:23 +0200
> 
> I don't understand why you're trying to make everything so difficult.

I don't.  We just disagree, that's all.

I modified the doc string to add the missing information about the
shells for which the function was designed.  I don't think we should
add anything else, for the reasons I pointed out already.

> If for reasons unclear to me you absolutely refuse to accept these
> improvements to shell-quote-argument's documentation, I will just
> continue not using the function, because it cannot be trusted.

How can documentation make a function more trustworthy?  And what does
that have to do with this bug report?  This bug report is about the
documentation of shell-quote-argument, not whether it is safe and
should or should not be used.

I think this bug should be closed now.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#21702; Package emacs. (Mon, 19 Oct 2015 10:26:01 GMT) Full text and rfc822 format available.

Message #38 received at 21702 <at> debbugs.gnu.org (full text, mbox):

From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 21702 <at> debbugs.gnu.org
Subject: Re: bug#21702: shell-quote-argument semantics and safety
Date: Mon, 19 Oct 2015 12:25:17 +0200

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
>> Cc: 21702 <at> debbugs.gnu.org
>> Date: Mon, 19 Oct 2015 11:50:23 +0200
>> 
>> I don't understand why you're trying to make everything so difficult.
>
> I don't.  We just disagree, that's all.
>
> I modified the doc string to add the missing information about the
> shells for which the function was designed.  I don't think we should
> add anything else, for the reasons I pointed out already.
>
>> If for reasons unclear to me you absolutely refuse to accept these
>> improvements to shell-quote-argument's documentation, I will just
>> continue not using the function, because it cannot be trusted.
>
> How can documentation make a function more trustworthy?  And what does
> that have to do with this bug report?  This bug report is about the
> documentation of shell-quote-argument, not whether it is safe and
> should or should not be used.
>
> I think this bug should be closed now.

I don't want to repeat myself for the dozenth time so do as you wish,
I'll simply continue not using the function nonchalantly and recommend
others to do the same.

Taylan

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Thu, 22 Oct 2015 03:51:01 GMT) Full text and rfc822 format available.

Notification sent to taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer):
bug acknowledged by developer. (Thu, 22 Oct 2015 03:51:02 GMT) Full text and rfc822 format available.

Message #43 received at 21702-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 21702-done <at> debbugs.gnu.org
Cc: Taylan Ulrich Bayırlı/Kammer
 <taylanbayirli <at> gmail.com>
Subject: Re: shell-quote-argument semantics and safety
Date: Wed, 21 Oct 2015 20:49:37 -0700

I installed a patch to the Emacs manual that attempts to address the 
documentation problem, and am boldly closing the bug. The bug report can 
be reopened if more work is needed re shell-quote-argument's 
documentation. For the patch, please see:

http://git.savannah.gnu.org/cgit/emacs.git/commit/?id=f373e812d95e1822833f88db024e011a769998b4

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 19 Nov 2015 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 271 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #21702 shell-quote-argument semantics and safety

GNU bug report logs - #21702
shell-quote-argument semantics and safety