#78984 - 31.0.50; `url-build-query-string' fails to escape a literal `%' in keys and values

GNU bug report logs - #78984
31.0.50; `url-build-query-string' fails to escape a literal `%' in keys and values

Package: emacs;

Reported by: Steven Allen <steven <at> stebalien.com>

Date: Wed, 9 Jul 2025 21:30:04 UTC

Severity: normal

Found in version 31.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Steven Allen <steven <at> stebalien.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 78984 <at> debbugs.gnu.org Subject: bug#78984: 31.0.50; `url-build-query-string' fails to escape a literal `%' in keys and values Date: Thu, 17 Jul 2025 13:01:45 -0700

[Message part 1 (text/plain, inline)]

Eli Zaretskii <eliz <at> gnu.org> writes: >> That sounds like a reasonable solution for `url-host-allowed-chars', >> `url-path-allowed-chars', `url-query-allowed-chars', and >> `url-query-key-value-allowed-chars': >> >> 1. Allowing `%' is "correct" as `%' is, in fact, allowed in URLs. >> >> 2. These variables are used in conjunction with `url-hexify-string' to >> escape strings that may already be partially %-encoded. >> >> 3. From what I can tell, the documentation in these cases is technically >> correct already, if a bit misleading. >> >> However, IMO, `url-build-query-string' should escape everything, >> including literal `%' characters: >> >> 1. `url-parse-query-string', the inverse of `url-build-query-string', >> decodes %-encoded strings. In order for these two functions to >> round-trip, `url-build-query-string' needs to escape `%'. Specifically, >> the following test should pass: >> >> (ert-deftest url-query-string-round-trips () >> (let* ((query-string "foo=%25%3D") >> (parsed-query (url-parse-query-string query-string)) >> (round-trip (url-build-query-string parsed-query))) >> (should (equal round-trip query-string)))) >> >> 2. `url-build-query-string' is a higher-level tool to transform structured >> data into a query string. IMO, the user shouldn't ahve to think about >> pre-escaping their strings here. >> >> I'm happy to provide patches (both to improve the docs and fix this >> issue) if you think this is a reasonable approach. > > The situation you described sounds like a contradiction to me: some > use cases need url-build-query-string to have literal % encoded as > %25, and others need the literal % NOT encoded. In particular, you > originally said that the root cause of the problem was that > url-path-allowed-chars allows '%', but now you say that > url-build-query-string should encode literal '%', although > url-path-allowed-chars is okay as it is? So I don't understand what > kind of a solution you have in mind; perhaps I'm missing something? > > I guess I'm confused. > > Thanks. I apologize, I shouldn't have called that the root cause. `url-path-allowed-chars' allowing % isn't incorrect, it's just the ultimate reason `url-build-query-string' passes through literal % characters without encoding them. As a matter of fact, this (`url-path-allowed-chars') cannot be changed without breaking `url-path-and-query' and `url-encode-url' as I will explain below: First, `url-path-and-query' must not %-decode the path and the query as that would make it impossible to parse them correctly. E.g., "/some%2fpath" must not be decoded as "/some/path" before splitting the path into components and the same goes for query strings with respect to & and =. Specifically, I'm referring to the following in `url-encode-url': (let* ... (path-and-query (url-path-and-query obj)) Therefore, `url-encode-url' must not %-encode literal % characters when encoding the path and query because they were not (nor could they have been) %-decoded by `url-path-and-query'. Specifically, I'm referring to the following in `url-encode-url': (if path (setq path (url-hexify-string path url-path-allowed-chars))) (if query (setq query (url-hexify-string query url-query-allowed-chars))) That's why `url-path-allowed-chars' and `url-query-allowed-chars' must allow literal % characters without re-encoding them. However, `url-build-query-string' and `url-parse-query-string' are different. Unlike `url-path-and-query', `url-parse-query-string' decodes the query into a structured object (a key/values alist) and can (and does) completely %-decode the keys & values. That means the inverse, `url-build-query-string', should assume that its inputs are not already %-encoded and should %-encode literal % characters. I've attached a patch to show you what I mean. In this patch, I introduced a new `url--query-key-value-preserved-chars' constant, separate from `url-query-key-value-allowed-chars', to keep `url-query-key-value-allowed-chars' consistent with the other `-allowed-chars' constants. That is, while % is ALLOWED to appear in encoded query keys and values, it shouldn't be PRESERVED when %-escaping them. (I'm also happy to change `url-query-key-value-allowed-chars' directly if that's what you'd prefer)

This bug report was last modified 12 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #78984 31.0.50; `url-build-query-string' fails to escape a literal `%' in keys and values

GNU bug report logs - #78984
31.0.50; `url-build-query-string' fails to escape a literal `%' in keys and values