#46342 - 28.0.50; socks-send-command munges IP address bytes to UTF-8

GNU bug report logs - #46342
28.0.50; socks-send-command munges IP address bytes to UTF-8

Package: emacs;

Reported by: "J.P." <jp <at> neverwas.me>

Date: Sat, 6 Feb 2021 11:47:01 UTC

Severity: normal

Tags: fixed, patch

Found in version 28.0.50

Fixed in version 28.1

Done: "J.P." <jp <at> neverwas.me>

Bug is archived. No further changes may be made.

Message #17 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 46342 <at> debbugs.gnu.org Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes to UTF-8 Date: Sun, 07 Feb 2021 06:22:53 -0800

[Message part 1 (text/plain, inline)]

Eli Zaretskii <eliz <at> gnu.org> writes: >> Re appropriate encoding: correct me if I'm wrong (internet), but among >> the Emacs coding systems, it'd be latin-1. > > That depends on what the other end expects. Does it expect latin-1 in > this case? From the point of view of Emacs, I'd say yes: the other end, meaning the proxy service, expects latin-1. From the service's point of view, it only speaks byte sequences and doesn't interpret any fields as text [1]. This continues after proxying has commenced; incoming byte sequences are forwarded verbatim as opaque payloads. > Does emitting the single byte \330 produce the correct result in this > case? Then by all means please use > > (encode-coding-string address 'latin-1) It does indeed produce the correct result [2], and I've updated the patch to reflect this. I wasn't sure whether you wanted me to replace all the vectors in the tests with strings and/or annotate them with comments explaining the protocol, so I just left them as is for now. My main concern (based on sheer ignorance) was any possible side effects that may occur from encode-coding-string setting the variable last-coding-system-used to latin-1. I investigated a little by stepping through the subsequent send_process() call and found that the variable's value as latin-1 appears short lived because it's quickly reassigned to binary. I tried to demonstrate this in the attached log of my debug session (and also show that no conversion is performed). Please pardon my sad debugging skills. >> Re program on the other end: this would be any program offering a proxy >> service that speaks the same protocol. Popular ones include tor and ssh. >> [...] > > And those expect Latin-1 encoding in this case? I'd say yes, insofar as these programs are examples of a proxy service of the sort mentioned in the first answer above. Thanks again [1] Although, in the case of SOCKS 4A/5, non-numeric addresses, i.e., domain names, should probably be expressed in characters a resolver can understand, like the Punycode ASCII subset. [2] there is one tiny difference in behavior from the previous iteration of this patch, but it's not worth anyone's time, so I'll just note it here for the record: when called in the manner shown in the patch, encode-coding-string silently replaces multibyte characters with spaces. The only edge case I could think of in which accidentally passing a multibyte might be harder to debug than a normal typo would be when hitting an address like ec2-13-56-13-123.us-west-1.compute.amazonaws.com and accidentally passing 13.256.13.123 (as "\15\u0100\15\173"), which would be routed to 13.32.13.123 (flickr/cloudflare). One way to avoid this would be with validation like that performed by unibyte-string or, alternatively, by purposefully violating the protocol and sending say, "\15\15{" instead of "\15 \15{" (and thereby triggering an error response from the service). All in all, this seems unlikely enough not to warrant special attention.

[debug_session.log (text/plain, attachment)]

This bug report was last modified 4 years and 143 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #46342 28.0.50; socks-send-command munges IP address bytes to UTF-8

GNU bug report logs - #46342
28.0.50; socks-send-command munges IP address bytes to UTF-8