GNU bug report logs -
#46342
28.0.50; socks-send-command munges IP address bytes to UTF-8
Previous Next
Reported by: "J.P." <jp <at> neverwas.me>
Date: Sat, 6 Feb 2021 11:47:01 UTC
Severity: normal
Tags: fixed, patch
Found in version 28.0.50
Fixed in version 28.1
Done: "J.P." <jp <at> neverwas.me>
Bug is archived. No further changes may be made.
Full log
Message #17 received at 46342 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>> Re appropriate encoding: correct me if I'm wrong (internet), but among
>> the Emacs coding systems, it'd be latin-1.
>
> That depends on what the other end expects. Does it expect latin-1 in
> this case?
From the point of view of Emacs, I'd say yes: the other end, meaning the
proxy service, expects latin-1. From the service's point of view, it
only speaks byte sequences and doesn't interpret any fields as text [1].
This continues after proxying has commenced; incoming byte sequences are
forwarded verbatim as opaque payloads.
> Does emitting the single byte \330 produce the correct result in this
> case? Then by all means please use
>
> (encode-coding-string address 'latin-1)
It does indeed produce the correct result [2], and I've updated the
patch to reflect this. I wasn't sure whether you wanted me to replace
all the vectors in the tests with strings and/or annotate them with
comments explaining the protocol, so I just left them as is for now.
My main concern (based on sheer ignorance) was any possible side effects
that may occur from encode-coding-string setting the variable
last-coding-system-used to latin-1. I investigated a little by stepping
through the subsequent send_process() call and found that the variable's
value as latin-1 appears short lived because it's quickly reassigned to
binary. I tried to demonstrate this in the attached log of my debug
session (and also show that no conversion is performed). Please pardon
my sad debugging skills.
>> Re program on the other end: this would be any program offering a proxy
>> service that speaks the same protocol. Popular ones include tor and ssh.
>> [...]
>
> And those expect Latin-1 encoding in this case?
I'd say yes, insofar as these programs are examples of a proxy service
of the sort mentioned in the first answer above.
Thanks again
[1] Although, in the case of SOCKS 4A/5, non-numeric addresses, i.e.,
domain names, should probably be expressed in characters a resolver can
understand, like the Punycode ASCII subset.
[2] there is one tiny difference in behavior from the previous iteration
of this patch, but it's not worth anyone's time, so I'll just note it
here for the record: when called in the manner shown in the patch,
encode-coding-string silently replaces multibyte characters with spaces.
The only edge case I could think of in which accidentally passing a
multibyte might be harder to debug than a normal typo would be when
hitting an address like ec2-13-56-13-123.us-west-1.compute.amazonaws.com
and accidentally passing 13.256.13.123 (as "\15\u0100\15\173"), which
would be routed to 13.32.13.123 (flickr/cloudflare).
One way to avoid this would be with validation like that performed by
unibyte-string or, alternatively, by purposefully violating the protocol
and sending say, "\15\15{" instead of "\15 \15{" (and thereby triggering
an error response from the service). All in all, this seems unlikely
enough not to warrant special attention.
[0001-Add-more-auth-related-tests-for-socks.el.patch (text/x-patch, attachment)]
[0002-Preserve-numeric-addresses-in-socks-send-command.patch (text/x-patch, attachment)]
[debug_session.log (text/plain, attachment)]
This bug report was last modified 4 years and 86 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.