GNU bug report logs - #78984
31.0.50; `url-build-query-string' fails to escape a literal `%' in keys and values

Previous Next

Package: emacs;

Reported by: Steven Allen <steven <at> stebalien.com>

Date: Wed, 9 Jul 2025 21:30:04 UTC

Severity: normal

Found in version 31.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Steven Allen <steven <at> stebalien.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78984 <at> debbugs.gnu.org
Subject: bug#78984: 31.0.50; `url-build-query-string' fails to escape a literal `%' in keys and values
Date: Thu, 17 Jul 2025 13:01:45 -0700
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>> That sounds like a reasonable solution for `url-host-allowed-chars',
>> `url-path-allowed-chars', `url-query-allowed-chars', and
>> `url-query-key-value-allowed-chars':
>>
>> 1. Allowing `%' is "correct" as `%' is, in fact, allowed in URLs.
>>
>> 2. These variables are used in conjunction with `url-hexify-string' to
>> escape strings that may already be partially %-encoded.
>>
>> 3. From what I can tell, the documentation in these cases is technically
>> correct already, if a bit misleading.
>>
>> However, IMO, `url-build-query-string' should escape everything,
>> including literal `%' characters:
>>
>> 1. `url-parse-query-string', the inverse of `url-build-query-string',
>> decodes %-encoded strings. In order for these two functions to
>> round-trip, `url-build-query-string' needs to escape `%'. Specifically,
>> the following test should pass:
>>
>>     (ert-deftest url-query-string-round-trips ()
>>       (let* ((query-string "foo=%25%3D")
>>              (parsed-query (url-parse-query-string query-string))
>>              (round-trip (url-build-query-string parsed-query)))
>>         (should (equal round-trip query-string))))
>>
>> 2. `url-build-query-string' is a higher-level tool to transform structured
>> data into a query string. IMO, the user shouldn't ahve to think about
>> pre-escaping their strings here.
>>
>> I'm happy to provide patches (both to improve the docs and fix this
>> issue) if you think this is a reasonable approach.
>
> The situation you described sounds like a contradiction to me: some
> use cases need url-build-query-string to have literal % encoded as
> %25, and others need the literal % NOT encoded.  In particular, you
> originally said that the root cause of the problem was that
> url-path-allowed-chars allows '%', but now you say that
> url-build-query-string should encode literal '%', although
> url-path-allowed-chars is okay as it is?  So I don't understand what
> kind of a solution you have in mind; perhaps I'm missing something?
>
> I guess I'm confused.
>
> Thanks.

I apologize, I shouldn't have called that the root cause.
`url-path-allowed-chars' allowing % isn't incorrect, it's just the
ultimate reason `url-build-query-string' passes through literal %
characters without encoding them. As a matter of fact, this
(`url-path-allowed-chars') cannot be changed without breaking
`url-path-and-query' and `url-encode-url' as I will explain below:

First, `url-path-and-query' must not %-decode the path and the query as
that would make it impossible to parse them correctly. E.g.,
"/some%2fpath" must not be decoded as "/some/path" before splitting the
path into components and the same goes for query strings with respect to
& and =. Specifically, I'm referring to the following in `url-encode-url':

    (let* ...
         (path-and-query (url-path-and-query obj))

Therefore, `url-encode-url' must not %-encode literal % characters when
encoding the path and query because they were not (nor could they have
been) %-decoded by `url-path-and-query'. Specifically, I'm referring to
the following in `url-encode-url':

    (if path
	(setq path (url-hexify-string path url-path-allowed-chars)))
    (if query
	(setq query (url-hexify-string query url-query-allowed-chars)))

That's why `url-path-allowed-chars' and `url-query-allowed-chars' must
allow literal % characters without re-encoding them.

However, `url-build-query-string' and `url-parse-query-string' are
different. Unlike `url-path-and-query', `url-parse-query-string' decodes
the query into a structured object (a key/values alist) and can (and
does) completely %-decode the keys & values. That means the inverse,
`url-build-query-string', should assume that its inputs are not
already %-encoded and should %-encode literal % characters.

I've attached a patch to show you what I mean. In this patch, I
introduced a new `url--query-key-value-preserved-chars' constant,
separate from `url-query-key-value-allowed-chars', to keep
`url-query-key-value-allowed-chars' consistent with the other
`-allowed-chars' constants. That is, while % is ALLOWED to appear in
encoded query keys and values, it shouldn't be PRESERVED when %-escaping
them.

(I'm also happy to change `url-query-key-value-allowed-chars' directly
if that's what you'd prefer)

[0001-encode-literal-characters-when-building-query-string.patch (text/x-patch, attachment)]

This bug report was last modified 12 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.