GNU bug report logs -
#78984
31.0.50; `url-build-query-string' fails to escape a literal `%' in keys and values
Previous Next
Reported by: Steven Allen <steven <at> stebalien.com>
Date: Wed, 9 Jul 2025 21:30:04 UTC
Severity: normal
Found in version 31.0.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>> That sounds like a reasonable solution for `url-host-allowed-chars',
>> `url-path-allowed-chars', `url-query-allowed-chars', and
>> `url-query-key-value-allowed-chars':
>>
>> 1. Allowing `%' is "correct" as `%' is, in fact, allowed in URLs.
>>
>> 2. These variables are used in conjunction with `url-hexify-string' to
>> escape strings that may already be partially %-encoded.
>>
>> 3. From what I can tell, the documentation in these cases is technically
>> correct already, if a bit misleading.
>>
>> However, IMO, `url-build-query-string' should escape everything,
>> including literal `%' characters:
>>
>> 1. `url-parse-query-string', the inverse of `url-build-query-string',
>> decodes %-encoded strings. In order for these two functions to
>> round-trip, `url-build-query-string' needs to escape `%'. Specifically,
>> the following test should pass:
>>
>> (ert-deftest url-query-string-round-trips ()
>> (let* ((query-string "foo=%25%3D")
>> (parsed-query (url-parse-query-string query-string))
>> (round-trip (url-build-query-string parsed-query)))
>> (should (equal round-trip query-string))))
>>
>> 2. `url-build-query-string' is a higher-level tool to transform structured
>> data into a query string. IMO, the user shouldn't ahve to think about
>> pre-escaping their strings here.
>>
>> I'm happy to provide patches (both to improve the docs and fix this
>> issue) if you think this is a reasonable approach.
>
> The situation you described sounds like a contradiction to me: some
> use cases need url-build-query-string to have literal % encoded as
> %25, and others need the literal % NOT encoded. In particular, you
> originally said that the root cause of the problem was that
> url-path-allowed-chars allows '%', but now you say that
> url-build-query-string should encode literal '%', although
> url-path-allowed-chars is okay as it is? So I don't understand what
> kind of a solution you have in mind; perhaps I'm missing something?
>
> I guess I'm confused.
>
> Thanks.
I apologize, I shouldn't have called that the root cause.
`url-path-allowed-chars' allowing % isn't incorrect, it's just the
ultimate reason `url-build-query-string' passes through literal %
characters without encoding them. As a matter of fact, this
(`url-path-allowed-chars') cannot be changed without breaking
`url-path-and-query' and `url-encode-url' as I will explain below:
First, `url-path-and-query' must not %-decode the path and the query as
that would make it impossible to parse them correctly. E.g.,
"/some%2fpath" must not be decoded as "/some/path" before splitting the
path into components and the same goes for query strings with respect to
& and =. Specifically, I'm referring to the following in `url-encode-url':
(let* ...
(path-and-query (url-path-and-query obj))
Therefore, `url-encode-url' must not %-encode literal % characters when
encoding the path and query because they were not (nor could they have
been) %-decoded by `url-path-and-query'. Specifically, I'm referring to
the following in `url-encode-url':
(if path
(setq path (url-hexify-string path url-path-allowed-chars)))
(if query
(setq query (url-hexify-string query url-query-allowed-chars)))
That's why `url-path-allowed-chars' and `url-query-allowed-chars' must
allow literal % characters without re-encoding them.
However, `url-build-query-string' and `url-parse-query-string' are
different. Unlike `url-path-and-query', `url-parse-query-string' decodes
the query into a structured object (a key/values alist) and can (and
does) completely %-decode the keys & values. That means the inverse,
`url-build-query-string', should assume that its inputs are not
already %-encoded and should %-encode literal % characters.
I've attached a patch to show you what I mean. In this patch, I
introduced a new `url--query-key-value-preserved-chars' constant,
separate from `url-query-key-value-allowed-chars', to keep
`url-query-key-value-allowed-chars' consistent with the other
`-allowed-chars' constants. That is, while % is ALLOWED to appear in
encoded query keys and values, it shouldn't be PRESERVED when %-escaping
them.
(I'm also happy to change `url-query-key-value-allowed-chars' directly
if that's what you'd prefer)
[0001-encode-literal-characters-when-building-query-string.patch (text/x-patch, attachment)]
This bug report was last modified 12 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.