GNU bug report logs - #46342
28.0.50; socks-send-command munges IP address bytes to UTF-8

Previous Next

Package: emacs;

Reported by: "J.P." <jp <at> neverwas.me>

Date: Sat, 6 Feb 2021 11:47:01 UTC

Severity: normal

Tags: fixed, patch

Found in version 28.0.50

Fixed in version 28.1

Done: "J.P." <jp <at> neverwas.me>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 46342 in the body.
You can then email your comments to 46342 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 06 Feb 2021 11:47:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "J.P." <jp <at> neverwas.me>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 06 Feb 2021 11:47:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; socks-send-command munges IP address bytes to UTF-8
Date: Sat, 06 Feb 2021 03:46:26 -0800
[Message part 1 (text/plain, inline)]
Tags: patch

Hi,

The behavior in question only affects SOCKS version 4 [1] and is shown
in the second attachment. The first revises some helpers and adds
additional tests I was too lazy or thoughtless to include in recent
patches.

The third contains a naive attempt at a fix but may suffer from my
inadequate grasp of Emacs coding systems [2]. Please advise if that's so
and/or I need to hit the books (or get my head examined).

Thanks,
J.P.


[1] The function socks--open-network-stream assumes all v5 addresses to
be of type 3, which means a domain name of variable length, presumably
of the LDH character set: https://tools.ietf.org/html/rfc1928#section-5

[2] Re coding-systems and proposed fix: I'm wondering if I should have
used encode-coding-string (or something else) instead. Also: while
socks--open-network-stream is perhaps the real culprit because it
creates the offending string, I figured it's more resilient to have the
function doing the sending to massage any IP address it encounters.


In GNU Emacs 28.0.50 (build 1, x86_64-redhat-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0)
 of 2021-01-31 built on localhost
Repository revision: 431b098a206d27a2dff6a88312c28c36926f90e9
Repository branch: master
Windowing system distributor 'Fedora Project', version 11.0.12010000
System Description: Fedora 33 (Workstation Edition)

Configured using:
 'configure --enable-check-lisp-object-type --enable-checking=yes,glyphs
 --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu
 --program-prefix= --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin
 --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share
 --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec
 --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man
 --infodir=/usr/share/info --with-dbus --with-gif --with-jpeg --with-png
 --with-rsvg --with-tiff --with-xft --with-xpm --with-x-toolkit=gtk3
 --with-gpm=no --with-xwidgets --with-modules --with-harfbuzz
 --with-cairo --with-json build_alias=x86_64-redhat-linux-gnu
 host_alias=x86_64-redhat-linux-gnu CC=gcc 'CFLAGS=-DMAIL_USE_LOCKF -O0
 -flto=auto -ffat-lto-objects -fexceptions -g3 -grecord-gcc-switches
 -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2
 -Wp,-D_GLIBCXX_ASSERTIONS
 -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong
 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
 -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
 LDFLAGS=-Wl,-z,relro
 PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON
LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY
INOTIFY PDUMPER PNG RSVG SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE
XIM XPM XWIDGETS GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
rfc822 mml easymenu mml-sec epa derived epg epg-config gnus-util rmail
rmail-loaddefs auth-source cl-seq eieio eieio-core cl-macs
eieio-loaddefs password-cache json map text-property-search time-date
subr-x seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs
cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
iso-transl tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice button loaddefs faces cus-face macroexp files
window text-properties overlay sha1 md5 base64 format env code-pages
mule custom widget hashtable-print-readable backquote threads
xwidget-internal dbusbind inotify lcms2 dynamic-setting
system-font-setting font-render-setting cairo move-toolbar gtk x-toolkit
x multi-tty make-network-process emacs)

Memory information:
((conses 16 51852 8897)
 (symbols 48 6598 1)
 (strings 32 18255 2081)
 (string-bytes 1 611217)
 (vectors 16 12543)
 (vector-slots 8 172252 11638)
 (floats 8 25 39)
 (intervals 56 275 0)
 (buffers 984 10))
[0001-Add-more-auth-related-tests-for-socks.el.patch (text/x-patch, attachment)]
[0002-DROP-ME-demo-UTF-8-encoding-of-numeric-addresses.patch (text/x-patch, attachment)]
[0002-Preserve-numeric-addresses-in-socks-send-command.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 06 Feb 2021 12:27:01 GMT) Full text and rfc822 format available.

Message #8 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50;
 socks-send-command munges IP address bytes to UTF-8
Date: Sat, 06 Feb 2021 14:26:31 +0200
> From: "J.P." <jp <at> neverwas.me>
> Date: Sat, 06 Feb 2021 03:46:26 -0800
> 
> [2] Re coding-systems and proposed fix: I'm wondering if I should have
> used encode-coding-string (or something else) instead. Also: while
> socks--open-network-stream is perhaps the real culprit because it
> creates the offending string, I figured it's more resilient to have the
> function doing the sending to massage any IP address it encounters.

Emacs holds all non-ASCII characters internally in (a superset of)
UTF-8 encoding.  When passing those strings to external programs or
network connections, they should be encoded as appropriate.

What I don't understand is what is the "appropriate" encoding in this
case.  Can you explain why you use literal bytes in the test?  What
are those bytes supposed to stand for, and which program is supposed
to receive this sequence of bytes on the other end of the connect
command?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 06 Feb 2021 14:21:02 GMT) Full text and rfc822 format available.

Message #11 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 06 Feb 2021 06:19:55 -0800
Hi Eli. Thanks for the quick reply.

Eli Zaretskii <eliz <at> gnu.org> writes:

> What I don't understand is what is the "appropriate" encoding in this
> case. Can you explain why you use literal bytes in the test?  What
> are those bytes supposed to stand for, and which program is supposed
> to receive this sequence of bytes on the other end of the connect
> command?

Re appropriate encoding: correct me if I'm wrong (internet), but among
the Emacs coding systems, it'd be latin-1. In the proposed patch for
socks-send-command, swapping out the call to unibyte-string with
(encode-coding-string address 'latin-1) has the same effect of
preserving, say, char 216 as the single byte "\330" and not "\303\230".

Re literal bytes in test(s): I'm assuming you mean all the vectors. If
that's annoying or distracting, my apologies. I used them for no other
reason than I thought them easier to read. I also found them less error
prone than octal or hex escapes in strings, as warned about in (info
"(elisp) Non-ASCII in Strings"). However, I'm happy to follow whatever
the convention is.

Re meaning of the bytes: they stand for the fields in SOCKS 4 messages.
The protocol is byte oriented, so it's easy enough to read with tcpdump
since there's no meticulous packing going on when serializing, just
verbatim octets. Not the most economical, but after a brief negotiation,
it's off to proxying.

For example, you'd read

  4 1 0 80 93 184 216 34

as version 4, command 1 (connect), port 80 (2 bytes, big endian to 65535),
IPv4 address 93.184.216.34

Re program on the other end: this would be any program offering a proxy
service that speaks the same protocol. Popular ones include tor and ssh.
There's also a "bind" command that allows your client (Emacs) to act as
a server and listen/accept connections on the remote end as if they were
present on your local network.

I'm no expert, but I'll do my best to answer any further questions.
Thanks so much and enjoy your weekend!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 06 Feb 2021 15:16:01 GMT) Full text and rfc822 format available.

Message #14 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 06 Feb 2021 17:15:50 +0200
> From: "J.P." <jp <at> neverwas.me>
> Cc: 46342 <at> debbugs.gnu.org
> Date: Sat, 06 Feb 2021 06:19:55 -0800
> 
> Re appropriate encoding: correct me if I'm wrong (internet), but among
> the Emacs coding systems, it'd be latin-1.

That depends on what the other end expects.  Does it expect latin-1 in
this case?

> In the proposed patch for socks-send-command, swapping out the call
> to unibyte-string with (encode-coding-string address 'latin-1) has
> the same effect of preserving, say, char 216 as the single byte
> "\330" and not "\303\230".

Does emitting the single byte \330 produce the correct result in this
case?  Then by all means please use

   (encode-coding-string address 'latin-1)

> Re program on the other end: this would be any program offering a proxy
> service that speaks the same protocol. Popular ones include tor and ssh.
> There's also a "bind" command that allows your client (Emacs) to act as
> a server and listen/accept connections on the remote end as if they were
> present on your local network.

And those expect Latin-1 encoding in this case?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sun, 07 Feb 2021 14:24:01 GMT) Full text and rfc822 format available.

Message #17 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sun, 07 Feb 2021 06:22:53 -0800
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Re appropriate encoding: correct me if I'm wrong (internet), but among
>> the Emacs coding systems, it'd be latin-1.
>
> That depends on what the other end expects.  Does it expect latin-1 in
> this case?

From the point of view of Emacs, I'd say yes: the other end, meaning the
proxy service, expects latin-1. From the service's point of view, it
only speaks byte sequences and doesn't interpret any fields as text [1].
This continues after proxying has commenced; incoming byte sequences are
forwarded verbatim as opaque payloads.

> Does emitting the single byte \330 produce the correct result in this
> case?  Then by all means please use
>
>    (encode-coding-string address 'latin-1)

It does indeed produce the correct result [2], and I've updated the
patch to reflect this. I wasn't sure whether you wanted me to replace
all the vectors in the tests with strings and/or annotate them with
comments explaining the protocol, so I just left them as is for now.

My main concern (based on sheer ignorance) was any possible side effects
that may occur from encode-coding-string setting the variable
last-coding-system-used to latin-1. I investigated a little by stepping
through the subsequent send_process() call and found that the variable's
value as latin-1 appears short lived because it's quickly reassigned to
binary. I tried to demonstrate this in the attached log of my debug
session (and also show that no conversion is performed). Please pardon
my sad debugging skills.

>> Re program on the other end: this would be any program offering a proxy
>> service that speaks the same protocol. Popular ones include tor and ssh.
>> [...]
>
> And those expect Latin-1 encoding in this case?

I'd say yes, insofar as these programs are examples of a proxy service
of the sort mentioned in the first answer above.

Thanks again


[1] Although, in the case of SOCKS 4A/5, non-numeric addresses, i.e.,
domain names, should probably be expressed in characters a resolver can
understand, like the Punycode ASCII subset.

[2] there is one tiny difference in behavior from the previous iteration
of this patch, but it's not worth anyone's time, so I'll just note it
here for the record: when called in the manner shown in the patch,
encode-coding-string silently replaces multibyte characters with spaces.

The only edge case I could think of in which accidentally passing a
multibyte might be harder to debug than a normal typo would be when
hitting an address like ec2-13-56-13-123.us-west-1.compute.amazonaws.com
and accidentally passing 13.256.13.123 (as "\15\u0100\15\173"), which
would be routed to 13.32.13.123 (flickr/cloudflare).

One way to avoid this would be with validation like that performed by
unibyte-string or, alternatively, by purposefully violating the protocol
and sending say, "\15\15{" instead of "\15 \15{" (and thereby triggering
an error response from the service). All in all, this seems unlikely
enough not to warrant special attention.
[0001-Add-more-auth-related-tests-for-socks.el.patch (text/x-patch, attachment)]
[0002-Preserve-numeric-addresses-in-socks-send-command.patch (text/x-patch, attachment)]
[debug_session.log (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Tue, 09 Feb 2021 15:18:02 GMT) Full text and rfc822 format available.

Message #20 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Tue, 09 Feb 2021 07:17:29 -0800
[Message part 1 (text/plain, inline)]
Eli,

Apologies for my previous rambling reply. Your time is supremely
valuable, and I should've been more mindful of that. I'm hoping you'll
allow me this mulligan (ahem):

> That depends on what the other end expects.  Does it expect latin-1 in
> this case?

Yes.

> Does emitting the single byte \330 produce the correct result in this
> case?

Yes.

> Then by all means please use
>
>    (encode-coding-string address 'latin-1)

I have updated my changes to reflect this.

> And those [tor/ssh] expect Latin-1 encoding in this case?

Yes.

When you've got a sec, please instruct me on how best to proceed (even
if that means buggering off 'til I've learned more about Emacs!). :)

Appreciate your patience, as always.
J.P.
[0001-Add-more-auth-related-tests-for-socks.el.patch (text/x-patch, attachment)]
[0002-Preserve-numeric-addresses-in-socks-send-command.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Tue, 09 Feb 2021 16:15:02 GMT) Full text and rfc822 format available.

Message #23 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Tue, 09 Feb 2021 18:14:29 +0200
> From: "J.P." <jp <at> neverwas.me>
> Cc: 46342 <at> debbugs.gnu.org
> Date: Tue, 09 Feb 2021 07:17:29 -0800
> 
> 
> [1:text/plain Hide]
> 
> Eli,
> 
> Apologies for my previous rambling reply. Your time is supremely
> valuable, and I should've been more mindful of that. I'm hoping you'll
> allow me this mulligan (ahem):
> 
> > That depends on what the other end expects.  Does it expect latin-1 in
> > this case?
> 
> Yes.
> 
> > Does emitting the single byte \330 produce the correct result in this
> > case?
> 
> Yes.
> 
> > Then by all means please use
> >
> >    (encode-coding-string address 'latin-1)
> 
> I have updated my changes to reflect this.
> 
> > And those [tor/ssh] expect Latin-1 encoding in this case?
> 
> Yes.
> 
> When you've got a sec, please instruct me on how best to proceed (even
> if that means buggering off 'til I've learned more about Emacs!). :)

(I'd really prefer that someone who knows more than I do about SOCKS
review this.)

If I must do that, I have a question: what kind of string can this
ADDRESS be?  My reading of RFC 1928 is that it normally is an IP
address, in which case encoding is not relevant, as it's an ASCII
string.  But it can also be a domain, right?  If so, what form can
this domain take?  If the domain has non-ASCII characters, shouldn't
it be hex-encoded, or run through IDNA?  I mean, are non-ASCII
characters in that place at all allowed?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Wed, 10 Feb 2021 13:18:02 GMT) Full text and rfc822 format available.

Message #26 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Wed, 10 Feb 2021 05:16:58 -0800
Eli Zaretskii <eliz <at> gnu.org> writes:

> what kind of string can this ADDRESS be? My reading of RFC 1928 is
> that it normally is an IP address, in which case encoding is not
> relevant, as it's an ASCII string. But it can also be a domain, right?

This patch only affects IP addresses, but I'm happy to look into the
domain name form as well.

> If so, what form can this domain take? If the domain has non-ASCII
> characters, shouldn't it be hex-encoded, or run through IDNA? I mean,
> are non-ASCII characters in that place at all allowed?

At first glance, both tor and ssh appear to call getaddrinfo() on the
remote end without accounting for the sender's locale or passing any
special IDN-related flags. But I'm still looking into these.

For now, if we're allowing anecdotal caveman logic, I'd wager the answer
is ASCII only. Here's why:

It seems feeding tor and ssh the hostname for Яндекс.рф (Yandex) as the
UTF-8 encoded byte string

  \xd0\xaf\xd0\xbd\xd0\xb4\xd0\xb5\xd0\xba\xd1\x81.\xd1\x80\xd1\x84

results in failure both when forwarding via CONNECT and when resolving
via tor's nonstandard RESOLVE command. (This is direct, no Emacs.)

However, passing the punified "xn--d1acpjx3f.xn--p1ai" works as
intended, forwarding to (or, in the case of RESOLVE, producing) an IP
from a Yandex-registered A record (for me, 77.88.55.66).

To try this at home (on separate ttys):

  $ ssh -TND 4711 my.sshd
  # tcpdump -i lo -nnX "port 4711"
  $ curl --verbose --proxy socks5h://localhost:4711 Яндекс.рф

Here's a trace for curl's actual call to the hostname conversion
function idn2_lookup_ul() [1], which is provided by GNU libidn2 [2].
It's hard to see without context, but this happens before any connection
is established (tcpdump will confirm this).

#0  Curl_idnconvert_hostname at lib/url.c:1566
#1  create_conn at lib/url.c:3583
#2  Curl_connect at lib/url.c:4027
#3  multi_runsingle at lib/multi.c:1671
#4  curl_multi_perform at lib/multi.c:2412
#5  easy_transfer at lib/easy.c:606
#6  easy_perform at lib/easy.c:696
#7  curl_easy_perform at lib/easy.c:715
#8  serial_transfers at src/tool_operate.c:2327
#9  run_all_transfers at src/tool_operate.c:2505
#10 operate at src/tool_operate.c:2621
#11 main at src/tool_main.c:277

On my machine, curl was configured to pass these flags to idn2_lookup_ul[3]:

  /* IDN2_NFC_INPUT: Normalize input string using normalization form C.
     IDN2_NONTRANSITIONAL: Perform Unicode TR46 non-transitional
     processing. */
  int flags = IDN2_NFC_INPUT | IDN2_NONTRANSITIONAL;

Apparently there are two IDNA standards: 2003 and 2008 [4]. Curl uses
the latter, but I'm not sure which, if any, puny.el favors. In the case
of Yandex,

  (puny-encode-domain "Яндекс.рф")

produces "xn--d1acpjx9e.xn--p1ai", which tor and ssh both reject (though
it's very possible I'm missing something.) Anyway, passing the version
above provided by libidn2 to socks-send-command works fine.

[1] https://github.com/curl/curl/blob/ec5d9b44a2e837fc7b82d1c60d5fae3f851620dc/lib/url.c#L1559
[2] https://www.gnu.org/software/libidn/libidn2/reference/libidn2-idn2.html#idn2-lookup-ul
[3] https://www.gnu.org/software/libidn/libidn2/reference/libidn2-idn2.html#idn2-flags
[4] https://www.unicode.org/reports/tr46/#Table_Example_Processing




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Wed, 10 Feb 2021 16:06:02 GMT) Full text and rfc822 format available.

Message #29 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Wed, 10 Feb 2021 18:04:56 +0200
> From: "J.P." <jp <at> neverwas.me>
> Cc: 46342 <at> debbugs.gnu.org
> Date: Wed, 10 Feb 2021 05:16:58 -0800
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > what kind of string can this ADDRESS be? My reading of RFC 1928 is
> > that it normally is an IP address, in which case encoding is not
> > relevant, as it's an ASCII string. But it can also be a domain, right?
> 
> This patch only affects IP addresses, but I'm happy to look into the
> domain name form as well.

Then I don't understand why we need to worry about encoding.  IP
addresses are pure ASCII strings, so they need no encoding whatsoever.

I guess I will have to ask you to back up and describe what problems
you saw with the original code, and show me the details of the strings
involved in that.

Thanks for the rest of your message, but I suggest that we limit
ourselves to IP addresses for the time being, and only go to non-ASCII
domains when we have the IP address use case figured out.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Thu, 11 Feb 2021 14:59:01 GMT) Full text and rfc822 format available.

Message #32 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Thu, 11 Feb 2021 06:58:00 -0800
Eli Zaretskii <eliz <at> gnu.org> writes:

> Then I don't understand why we need to worry about encoding.  IP
> addresses are pure ASCII strings, so they need no encoding whatsoever.

Clearly, I'm failing you here. Between my dearth of communications
skills and lack of Emacs know how, I've obviously managed to deceive you
into thinking that SOCKS IP address fields ought to contain ASCII text
characters such as the following:

  1  9  2  .  1  6  8  .  1  .  1
  31 39 32 2e 31 36 38 2e 31 2e 31

However, this is not the case [a]. In version 4, all addresses are
four-byte sequences, one byte for each component of an IPv4 address, and
the ordering is left-to-right. For example:

  192 168 1  1
  c0  a8  01 01

In version 5, the one covered by RFC 1928, this is extended to include
16-byte IPv6 addresses as well as ASCII domain names. All three are
exclusive to one another but occupy the same field in a union of sorts.
The first byte of that field, the "ATYP" flag, denotes which of the
three to expect, and it appears as the "atype" argument to
socks-send-command.

> I guess I will have to ask you to back up and describe what problems
> you saw with the original code, and show me the details of the strings
> involved in that.

The Elisp manual distinguishes between multibyte and unibyte "sources"
of strings [1]. For these (SOCKS 4) IP address strings, the function
socks--open-network-stream is that source (it creates them). When such
a string includes characters with code points between 128 and 255 (the
latin-1/iso-8859-1 range), single characters are sent as two utf-8
encoded bytes, which the SOCKS service rejects as violating protocol.

Specifically, when a user passes "example.com" to the entry-point
function socks-open-network-stream, its internal helper
socks--open-network-stream resolves the host name into an IP in list
form and then converts this to a string by calling

  (apply #'format "%c%c%c%c" '(93 184 216 34))

This produces a multibyte string of the same character length:

  "]¸Ø\""

However, when socks-send-command passes this to process-send-string,
whose coding system is (binary . binary), the underlying six-byte
sequence is emitted verbatim:

  "]\302\270\303\230\""

My initial idea was to leverage the function unibyte-string to ensure
every character can be encoded in 8 bits before transmission. Regardless,
performing some combination of validating and converting before sending
may be worthwhile since it'll only run once per connection.

Sorry for the extended play-by-play. I certainly hope none of it came
off as insulting or pedantic. I'm quite certain your grasp of such
concepts long ago outpaced any understanding I could ever hope to
attain.

J.P.


[a] My versions of tor and ssh definitely honor requests like

  curl --proxy socks5h://localhost:1080 http://93.184.216.34

passing the IP address as a domain name. Although this defies RFC 1928,
which specifies FQDNs only [1], I'm getting the sense that influential
projects treat the latter more as a living standard. (Note: in its unit
tests, tor only includes this form for its extension commands [2].)

[1] (elisp) Non-ASCII in Strings, second paragraph
[1] https://tools.ietf.org/html/rfc1928#section-5
[2] https://gitweb.torproject.org/tor.git/tree/src/test/test_socks.c#n335




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Thu, 11 Feb 2021 15:29:02 GMT) Full text and rfc822 format available.

Message #35 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Thu, 11 Feb 2021 17:28:06 +0200
> From: "J.P." <jp <at> neverwas.me>
> Cc: 46342 <at> debbugs.gnu.org
> Date: Thu, 11 Feb 2021 06:58:00 -0800
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Then I don't understand why we need to worry about encoding.  IP
> > addresses are pure ASCII strings, so they need no encoding whatsoever.
> 
> Clearly, I'm failing you here. Between my dearth of communications
> skills and lack of Emacs know how, I've obviously managed to deceive you
> into thinking that SOCKS IP address fields ought to contain ASCII text
> characters such as the following:
> 
>   1  9  2  .  1  6  8  .  1  .  1
>   31 39 32 2e 31 36 38 2e 31 2e 31
> 
> However, this is not the case [a]. In version 4, all addresses are
> four-byte sequences, one byte for each component of an IPv4 address, and
> the ordering is left-to-right. For example:
> 
>   192 168 1  1
>   c0  a8  01 01

Then they are what we call "raw bytes", and encoding them with
raw-text-unix should suffice.

How does the code which calls socks.el create these raw bytes?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Fri, 12 Feb 2021 14:31:01 GMT) Full text and rfc822 format available.

Message #38 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Fri, 12 Feb 2021 06:30:32 -0800
Eli Zaretskii <eliz <at> gnu.org> writes:

> Then they are what we call "raw bytes", and encoding them with
> raw-text-unix should suffice.

Thanks. Unfortunately, this produces the same utf-8 encoded bytes.

  (encode-coding-char 192 'raw-text-unix)
  ⇒ "\303\200"
  
It looks like raw-text-unix is an alias for binary [1], the coding
system already used by the network process sending the erroneous
request. I suppose it's always possible to strong arm it like

  (encode-coding-char (or (decode-char 'eight-bit c) c) 'raw-text-unix)
  ⇒ "^@" ... "\377"

But what about your original latin-1 suggestion? Is that no longer in
contention?

  (encode-coding-char 192 'latin-1)
  ⇒ "\300"

> How does the code which calls socks.el create these raw bytes?

This library has an entry-point function that's part of the url-gateway
dispatch mechanism. I can't say for certain, but it looks like url-http
is the only library directly using this facility. Regardless, the
function gets called with a (possibly multibyte) host name, which in
rare cases may be an ASCII IP address created by url-gateway.

With SOCKS4, that's kind of moot, since all names are looked up through
socks-nslookup-host, which returns an IPv4 address as a list of fixnums.
Its caller is an internal helper that converts this list into a
multibyte string for socks-send-command to emit onto the wire (where
it's then rejected by the service).

Currently, IP addresses aren't used at all for v5 connect-command
requests. And raw-byte IP addresses do not yet appear anywhere [2]. This
patch would introduce them, either as an argument to socks-send-command
or as something ephemeral produced by it (the current idea).


[1] (elisp) Coding System Basics
[2] Of course, these are generalities that don't apply to users who wire
everything up manually.






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Fri, 12 Feb 2021 15:05:02 GMT) Full text and rfc822 format available.

Message #41 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Fri, 12 Feb 2021 17:04:16 +0200
> From: "J.P." <jp <at> neverwas.me>
> Cc: 46342 <at> debbugs.gnu.org
> Date: Fri, 12 Feb 2021 06:30:32 -0800
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Then they are what we call "raw bytes", and encoding them with
> > raw-text-unix should suffice.
> 
> Thanks. Unfortunately, this produces the same utf-8 encoded bytes.
> 
>   (encode-coding-char 192 'raw-text-unix)
>   ⇒ "\303\200"

192 is not a raw-byte, it's a character whose Unicode codepoint is
192.  So you get its UTF-8 sequence.

> It looks like raw-text-unix is an alias for binary [1], the coding
> system already used by the network process sending the erroneous
> request.

The problem is with how the original request is generated, not how it
is encoded.

> I suppose it's always possible to strong arm it like
> 
>   (encode-coding-char (or (decode-char 'eight-bit c) c) 'raw-text-unix)
>   ⇒ "^@" ... "\377"

That's one way, yes.  But it isn't the best one.

> But what about your original latin-1 suggestion? Is that no longer in
> contention?

No, it isn't.

>   (encode-coding-char 192 'latin-1)
>   ⇒ "\300"

Not every byte above 127 is a valid character that Latin-1 can
meaningfully encode.  It is wrong to use Latin-1 for raw bytes.  What
you need is a way of generating a unibyte string from a series of raw
bytes,

> > How does the code which calls socks.el create these raw bytes?
> 
> This library has an entry-point function that's part of the url-gateway
> dispatch mechanism. I can't say for certain, but it looks like url-http
> is the only library directly using this facility. Regardless, the
> function gets called with a (possibly multibyte) host name, which in
> rare cases may be an ASCII IP address created by url-gateway.
> 
> With SOCKS4, that's kind of moot, since all names are looked up through
> socks-nslookup-host, which returns an IPv4 address as a list of fixnums.
> Its caller is an internal helper that converts this list into a
> multibyte string for socks-send-command to emit onto the wire (where
> it's then rejected by the service).
> 
> Currently, IP addresses aren't used at all for v5 connect-command
> requests. And raw-byte IP addresses do not yet appear anywhere [2]. This
> patch would introduce them, either as an argument to socks-send-command
> or as something ephemeral produced by it (the current idea).

So what is the problem with using unibyte-string for producing a
unibyte string from a list of bytes?  It sounds like it's exactly
what is needed here, and is actually used in some places in socks.el.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 13 Feb 2021 15:45:01 GMT) Full text and rfc822 format available.

Message #44 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 13 Feb 2021 07:43:51 -0800
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> The problem is with how the original request is generated, not how it
> is encoded.

Sorry, right. That's what I should have said. And explicitly applying an
encoding to finagle the wrong characters into the right bytes is not the
way to go, I now realize.

> So what is the problem with using unibyte-string for producing a
> unibyte string from a list of bytes? It sounds like it's exactly what
> is needed here, and is actually used in some places in socks.el.

Ah, that's the ticket. Thanks. I've also moved the conversion business
out of socks-send-command and into its caller where it's hopefully a
better fit.

[0001-Add-more-auth-related-tests-for-socks.el.patch (text/x-patch, inline)]
From a077fe903fef50d76f8b0d721a1c9894a7cf3a04 Mon Sep 17 00:00:00 2001
From: "F. Jason Park" <jp <at> neverwas.me>
Date: Fri, 5 Feb 2021 05:24:55 -0800
Subject: [PATCH 1/2] Add more auth-related tests for socks.el

* test/lisp/net/socks-tests.el (auth-registration-and-suite-offer)
(filter-response-parsing-v4, filter-response-parsing-v5): Assert
auth-method selection wrangling and socks-filter parsing.
(v5-auth-user-pass, v5-auth-user-pass-blank, v5-auth-none): Show prep
and execution of the SOCKS connect command and proxying of an HTTP
request; simplify fake server.
---
 test/lisp/net/socks-tests.el | 270 ++++++++++++++++++++++++++++-------
 1 file changed, 215 insertions(+), 55 deletions(-)

diff --git a/test/lisp/net/socks-tests.el b/test/lisp/net/socks-tests.el
index b378ed2964e..340a42d79cc 100644
--- a/test/lisp/net/socks-tests.el
+++ b/test/lisp/net/socks-tests.el
@@ -21,68 +21,151 @@
 
 ;;; Code:
 
+(require 'ert)
 (require 'socks)
 (require 'url-http)
 
-(defvar socks-tests-canned-server-port nil)
+(ert-deftest socks-tests-auth-registration-and-suite-offer ()
+  (ert-info ("Default favors user/pass auth")
+    (should (equal socks-authentication-methods
+                   '((2 "Username/Password" . socks-username/password-auth)
+                     (0 "No authentication" . identity))))
+    (should (equal "\2\0\2" (socks-build-auth-list)))) ; length [offer ...]
+  (let (socks-authentication-methods)
+    (ert-info ("Empty selection/no methods offered")
+      (should (equal "\0" (socks-build-auth-list))))
+    (ert-info ("Simulate library defaults")
+      (socks-register-authentication-method 0 "No authentication"
+                                            'identity)
+      (should (equal socks-authentication-methods
+                     '((0 "No authentication" . identity))))
+      (should (equal "\1\0" (socks-build-auth-list)))
+      (socks-register-authentication-method 2 "Username/Password"
+                                            'socks-username/password-auth)
+      (should (equal socks-authentication-methods
+                     '((2 "Username/Password" . socks-username/password-auth)
+                       (0 "No authentication" . identity))))
+      (should (equal "\2\0\2" (socks-build-auth-list))))
+    (ert-info ("Removal")
+      (socks-unregister-authentication-method 2)
+      (should (equal socks-authentication-methods
+                     '((0 "No authentication" . identity))))
+      (should (equal "\1\0" (socks-build-auth-list)))
+      (socks-unregister-authentication-method 0)
+      (should-not socks-authentication-methods)
+      (should (equal "\0" (socks-build-auth-list))))))
 
-(defun socks-tests-canned-server-create (verbatim patterns)
-  "Create a fake SOCKS server and return the process.
+(ert-deftest socks-tests-filter-response-parsing-v4 ()
+  "Ensure new chunks added on right (Bug#45162)."
+  (let* ((buf (generate-new-buffer "*test-socks-filter*"))
+         (proc (start-process "test-socks-filter" buf "sleep" "1")))
+    (process-put proc 'socks t)
+    (process-put proc 'socks-state socks-state-waiting)
+    (process-put proc 'socks-server-protocol 4)
+    (ert-info ("Receive initial incomplete segment")
+      (socks-filter proc (concat [0 90 0 0 93 184 216]))
+      ;; From example.com: OK status ^      ^ msg start
+      (ert-info ("State still set to waiting")
+        (should (eq (process-get proc 'socks-state) socks-state-waiting)))
+      (ert-info ("Response field is nil because processing incomplete")
+        (should-not (process-get proc 'socks-response)))
+      (ert-info ("Scratch field holds stashed partial payload")
+        (should (string= (concat [0 90 0 0 93 184 216])
+                         (process-get proc 'socks-scratch)))))
+    (ert-info ("Last part arrives")
+      (socks-filter proc "\42") ; ?\" 34
+      (ert-info ("State transitions to complete (length check passes)")
+        (should (eq (process-get proc 'socks-state) socks-state-connected)))
+      (ert-info ("Scratch and response fields hold stash w. last chunk")
+        (should (string= (concat [0 90 0 0 93 184 216 34])
+                         (process-get proc 'socks-response)))
+        (should (string= (process-get proc 'socks-response)
+                         (process-get proc 'socks-scratch)))))
+    (delete-process proc)
+    (kill-buffer buf)))
 
-`VERBATIM' and `PATTERNS' are dotted alists containing responses.
-Requests are tried in order.  On failure, an error is raised."
-  (let* ((buf (generate-new-buffer "*canned-socks-server*"))
+(ert-deftest socks-tests-filter-response-parsing-v5 ()
+  "Ensure new chunks added on right (Bug#45162)."
+  (let* ((buf (generate-new-buffer "*test-socks-filter*"))
+         (proc (start-process "test-socks-filter" buf "sleep" "1")))
+    (process-put proc 'socks t)
+    (process-put proc 'socks-state socks-state-waiting)
+    (process-put proc 'socks-server-protocol 5)
+    (ert-info ("Receive initial incomplete segment")
+      ;; From fedora.org: 2605:bc80:3010:600:dead:beef:cafe:fed9
+      ;; 5004 ~~> Version Status (OK) NOOP Addr-Type (4 -> IPv6)
+      (socks-filter proc "\5\0\0\4\x26\x05\xbc\x80\x30\x10\x00\x60")
+      (ert-info ("State still waiting and response emtpy")
+        (should (eq (process-get proc 'socks-state) socks-state-waiting))
+        (should-not (process-get proc 'socks-response)))
+      (ert-info ("Scratch field holds partial payload of pending msg")
+        (should (string= "\5\0\0\4\x26\x05\xbc\x80\x30\x10\x00\x60"
+                         (process-get proc 'socks-scratch)))))
+    (ert-info ("Middle chunk arrives")
+      (socks-filter proc "\xde\xad\xbe\xef\xca\xfe\xfe\xd9")
+      (ert-info ("State and response fields still untouched")
+        (should (eq (process-get proc 'socks-state) socks-state-waiting))
+        (should-not (process-get proc 'socks-response)))
+      (ert-info ("Scratch contains new arrival appended (on RHS)")
+        (should (string=  (concat "\5\0\0\4"
+                                  "\x26\x05\xbc\x80\x30\x10\x00\x60"
+                                  "\xde\xad\xbe\xef\xca\xfe\xfe\xd9")
+                          (process-get proc 'socks-scratch)))))
+    (ert-info ("Final part arrives (port number)")
+      (socks-filter proc "\0\0")
+      (ert-info ("State transitions to complete")
+        (should (eq (process-get proc 'socks-state) socks-state-connected)))
+      (ert-info ("Scratch and response fields show last chunk appended")
+        (should (string=  (concat "\5\0\0\4"
+                                  "\x26\x05\xbc\x80\x30\x10\x00\x60"
+                                  "\xde\xad\xbe\xef\xca\xfe\xfe\xd9"
+                                  "\0\0")
+                          (process-get proc 'socks-scratch)))
+        (should (string= (process-get proc 'socks-response)
+                         (process-get proc 'socks-scratch)))))
+    (delete-process proc)
+    (kill-buffer buf)))
+
+(defvar socks-tests-canned-server-patterns nil
+  "Alist containing request/response cons pairs to be tried in order.
+Vectors must match verbatim. Strings are considered regex patterns.")
+
+(defun socks-tests-canned-server-create ()
+  "Create and return a fake SOCKS server."
+  (let* ((port (nth 2 socks-server))
+         (name (format "socks-server:%d" port))
+         (pats socks-tests-canned-server-patterns)
          (filt (lambda (proc line)
-                 (let ((resp (or (assoc-default line verbatim
-                                                (lambda (k s) ; s is line
-                                                  (string= (concat k) s)))
-                                 (assoc-default line patterns
-                                                (lambda (p s)
-                                                  (string-match-p p s))))))
-                   (unless resp
+                 (pcase-let ((`(,pat . ,resp) (pop pats)))
+                   (unless (or (and (vectorp pat) (equal pat (vconcat line)))
+                               (string-match-p pat line))
                      (error "Unknown request: %s" line))
                    (let ((print-escape-control-characters t))
-                     (princ (format "<- %s\n" (prin1-to-string line)) buf)
-                     (princ (format "-> %s\n" (prin1-to-string resp)) buf))
+                     (message "[%s] <- %s" name (prin1-to-string line))
+                     (message "[%s] -> %s" name (prin1-to-string resp)))
                    (process-send-string proc (concat resp)))))
-         (srv (make-network-process :server 1
-                                    :buffer buf
-                                    :filter filt
-                                    :name "server"
-                                    :family 'ipv4
-                                    :host 'local
-                                    :service socks-tests-canned-server-port)))
-    (set-process-query-on-exit-flag srv nil)
-    (princ (format "[%s] Listening on localhost:10080\n" srv) buf)
-    srv))
-
-;; Add ([5 3 0 1 2] . [5 2]) to the `verbatim' list below to validate
-;; against curl 7.71 with the following options:
-;; $ curl --verbose -U foo:bar --proxy socks5h://127.0.0.1:10080 example.com
-;;
-;; If later implementing version 4a, try these:
-;; [4 1 0 80 0 0 0 1 0 ?e ?x ?a ?m ?p ?l ?e ?. ?c ?o ?m 0] . [0 90 0 0 0 0 0 0]
-;; $ curl --verbose --proxy socks4a://127.0.0.1:10080 example.com
+         (serv (make-network-process :server 1
+                                     :buffer (get-buffer-create name)
+                                     :filter filt
+                                     :name name
+                                     :family 'ipv4
+                                     :host 'local
+                                     :coding 'binary
+                                     :service port)))
+    (set-process-query-on-exit-flag serv nil)
+    serv))
 
-(ert-deftest socks-tests-auth-filter-url-http ()
-  "Verify correct handling of SOCKS5 user/pass authentication."
-  (let* ((socks-server '("server" "127.0.0.1" 10080 5))
-         (socks-username "foo")
-         (socks-password "bar")
-         (url-gateway-method 'socks)
+(defvar socks-tests--hello-world-http-request-pattern
+  (cons "^GET /" (concat "HTTP/1.1 200 OK\r\n"
+                         "Content-Type: text/plain\r\n"
+                         "Content-Length: 13\r\n\r\n"
+                         "Hello World!\n")))
+
+(defun socks-tests-perform-hello-world-http-request ()
+  "Start canned server, validate hello-world response, and finalize."
+  (let* ((url-gateway-method 'socks)
          (url (url-generic-parse-url "http://example.com"))
-         (verbatim '(([5 2 0 2] . [5 2])
-                     ([1 3 ?f ?o ?o 3 ?b ?a ?r] . [1 0])
-                     ([5 1 0 3 11 ?e ?x ?a ?m ?p ?l ?e ?. ?c ?o ?m 0 80]
-                      . [5 0 0 1 0 0 0 0 0 0])))
-         (patterns
-          `(("^GET /" . ,(concat "HTTP/1.1 200 OK\r\n"
-                                 "Content-Type: text/plain; charset=UTF-8\r\n"
-                                 "Content-Length: 13\r\n\r\n"
-                                 "Hello World!\n"))))
-         (socks-tests-canned-server-port 10080)
-         (server (socks-tests-canned-server-create verbatim patterns))
-         (tries 10)
+         (server (socks-tests-canned-server-create))
          ;;
          done
          ;;
@@ -90,14 +173,91 @@ socks-tests-auth-filter-url-http
                (goto-char (point-min))
                (should (search-forward "Hello World" nil t))
                (setq done t)))
-         (buf (url-http url cb '(nil))))
-    (ert-info ("Connect to HTTP endpoint over SOCKS5 with USER/PASS method")
-      (while (and (not done) (< 0 (cl-decf tries))) ; cl-lib via url-http
-        (sleep-for 0.1)))
+         (buf (url-http url cb '(nil)))
+         (proc (get-buffer-process buf))
+         (attempts 10))
+    (while (and (not done) (< 0 (cl-decf attempts)))
+      (sleep-for 0.1))
     (should done)
     (delete-process server)
+    (delete-process proc) ; otherwise seems client proc is sometimes reused
     (kill-buffer (process-buffer server))
     (kill-buffer buf)
     (ignore url-gateway-method)))
 
+;; Replace first pattern below with ([5 3 0 1 2] . [5 2]) to validate
+;; against curl 7.71 with the following options:
+;; $ curl --verbose -U foo:bar --proxy socks5h://127.0.0.1:10080 example.com
+
+(ert-deftest socks-tests-v5-auth-user-pass ()
+  "Verify correct handling of SOCKS5 user/pass authentication."
+  (should (assq 2 socks-authentication-methods))
+  (let ((socks-server '("server" "127.0.0.1" 10080 5))
+        (socks-username "foo")
+        (socks-password "bar")
+        (url-user-agent "Test/auth-user-pass")
+        (socks-tests-canned-server-patterns
+         `(([5 2 0 2] . [5 2])
+           ([1 3 ?f ?o ?o 3 ?b ?a ?r] . [1 0])
+           ([5 1 0 3 11 ?e ?x ?a ?m ?p ?l ?e ?. ?c ?o ?m 0 80]
+            . [5 0 0 1 0 0 0 0 0 0])
+           ,socks-tests--hello-world-http-request-pattern)))
+    (ert-info ("Make HTTP request over SOCKS5 with USER/PASS auth method")
+      (socks-tests-perform-hello-world-http-request))))
+
+;; Services (like Tor) may be configured without auth but for some
+;; reason still prefer the user/pass method over none when offered both.
+;; Given this library's defaults, the scenario below is possible.
+;;
+;; FYI: RFC 1929 doesn't say that a username or password is required
+;; but notes that the length of both fields should be at least one.
+;; However, both socks.el and curl send zero-length fields (though
+;; curl drops the user part too when the password is empty).
+;;
+;; From Tor's docs /socks-extensions.txt, 1.1 Extent of support:
+;; > We allow username/password fields of this message to be empty ...
+;; line 41 in blob 5fd1f828f3e9d014f7b65fa3bd1d33c39e4129e2
+;; https://gitweb.torproject.org/torspec.git/tree/socks-extensions.txt
+;;
+;; To verify against curl 7.71, swap out the first two pattern pairs
+;; with ([5 3 0 1 2] . [5 2]) and ([1 0 0] . [1 0]), then run:
+;; $ curl verbose -U "foo:" --proxy socks5h://127.0.0.1:10081 example.com
+
+(ert-deftest socks-tests-v5-auth-user-pass-blank ()
+  "Verify correct SOCKS5 user/pass authentication with empty pass."
+  (should (assq 2 socks-authentication-methods))
+  (let ((socks-server '("server" "127.0.0.1" 10081 5))
+        (socks-username "foo") ; defaults to (user-login-name)
+        (socks-password "") ; simulate user hitting enter when prompted
+        (url-user-agent "Test/auth-user-pass-blank")
+        (socks-tests-canned-server-patterns
+         `(([5 2 0 2] . [5 2])
+           ([1 3 ?f ?o ?o 0] . [1 0])
+           ([5 1 0 3 11 ?e ?x ?a ?m ?p ?l ?e ?. ?c ?o ?m 0 80]
+            . [5 0 0 1 0 0 0 0 0 0])
+           ,socks-tests--hello-world-http-request-pattern)))
+    (ert-info ("Make HTTP request over SOCKS5 with USER/PASS auth method")
+      (socks-tests-perform-hello-world-http-request))))
+
+;; Swap out ([5 2 0 1] . [5 0]) with the first pattern below to validate
+;; against curl 7.71 with the following options:
+;; $ curl --verbose --proxy socks5h://127.0.0.1:10082 example.com
+
+(ert-deftest socks-tests-v5-auth-none ()
+  "Verify correct handling of SOCKS5 when auth method 0 requested."
+  (let ((socks-server '("server" "127.0.0.1" 10082 5))
+        (socks-authentication-methods (append socks-authentication-methods
+                                              nil))
+        (url-user-agent "Test/auth-none")
+        (socks-tests-canned-server-patterns
+         `(([5 1 0] . [5 0])
+           ([5 1 0 3 11 ?e ?x ?a ?m ?p ?l ?e ?. ?c ?o ?m 0 80]
+            . [5 0 0 1 0 0 0 0 0 0])
+           ,socks-tests--hello-world-http-request-pattern)))
+    (socks-unregister-authentication-method 2)
+    (should-not (assq 2 socks-authentication-methods))
+    (ert-info ("Make HTTP request over SOCKS5 with no auth method")
+      (socks-tests-perform-hello-world-http-request)))
+  (should (assq 2 socks-authentication-methods)))
+
 ;;; socks-tests.el ends here
-- 
2.29.2

[0002-Use-raw-bytes-for-SOCKS-4-IP-addresses.patch (text/x-patch, inline)]
From 8a1b4835b5ba60cae0a0ee325a3110d04c92948f Mon Sep 17 00:00:00 2001
From: "F. Jason Park" <jp <at> neverwas.me>
Date: Fri, 5 Feb 2021 19:41:04 -0800
Subject: [PATCH 2/2] Use raw bytes for SOCKS 4 IP addresses

* lisp/net/socks.el: (socks--open-network-stream):
* test/lisp/net/socks-tests.el: (socks-tests-v4-basic): (Bug#46342).
---
 lisp/net/socks.el            |  2 +-
 test/lisp/net/socks-tests.el | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/lisp/net/socks.el b/lisp/net/socks.el
index 96fafc826b8..9aaa11ab416 100644
--- a/lisp/net/socks.el
+++ b/lisp/net/socks.el
@@ -528,7 +528,7 @@ socks--open-network-stream
 	        (setq host (socks-nslookup-host host))
 	        (if (not (listp host))
 	            (error "Could not get IP address for: %s" host))
-	        (setq host (apply #'format "%c%c%c%c" host))
+		(setq host (apply #'unibyte-string host))
                 socks-address-type-v4)
                (t
                 socks-address-type-name))))
diff --git a/test/lisp/net/socks-tests.el b/test/lisp/net/socks-tests.el
index 340a42d79cc..9a2dcba9daf 100644
--- a/test/lisp/net/socks-tests.el
+++ b/test/lisp/net/socks-tests.el
@@ -185,6 +185,26 @@ socks-tests-perform-hello-world-http-request
     (kill-buffer buf)
     (ignore url-gateway-method)))
 
+;; Unlike curl, socks.el includes the ID field (but otherwise matches):
+;; $ curl --proxy socks4://127.0.0.1:1080 example.com
+
+(ert-deftest socks-tests-v4-basic ()
+  "Show correct preparation of SOCKS4 connect command (Bug#46342)."
+  (let ((socks-server '("server" "127.0.0.1" 10079 4))
+        (url-user-agent "Test/4-basic")
+        (socks-tests-canned-server-patterns
+         `(([4 1 0 80 93 184 216 34 ?f ?o ?o 0] . [0 90 0 0 0 0 0 0])
+           ,socks-tests--hello-world-http-request-pattern))
+        socks-nslookup-program)
+    (ert-info ("Make HTTP request over SOCKS4")
+      (cl-letf (((symbol-function 'socks-nslookup-host)
+                 (lambda (host)
+                   (should (equal host "example.com"))
+                   (list 93 184 216 34)))
+                ((symbol-function 'user-full-name)
+                 (lambda () "foo")))
+        (socks-tests-perform-hello-world-http-request)))))
+
 ;; Replace first pattern below with ([5 3 0 1 2] . [5 2]) to validate
 ;; against curl 7.71 with the following options:
 ;; $ curl --verbose -U foo:bar --proxy socks5h://127.0.0.1:10080 example.com
-- 
2.29.2


Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Wed, 17 Feb 2021 15:00:02 GMT) Full text and rfc822 format available.

Message #47 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Wed, 17 Feb 2021 06:59:07 -0800
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> So what is the problem with using unibyte-string for producing a
> unibyte string from a list of bytes?  It sounds like it's exactly
> what is needed here, and is actually used in some places in socks.el.

Great, thanks. That's what was needed indeed. I've updated things
accordingly and have added a doc string to socks-send-command explaining
the address parameter.

[0001-Add-more-auth-related-tests-for-socks.el.patch (text/x-patch, attachment)]
[0002-Use-raw-bytes-for-SOCKS-4-IP-addresses.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 20 Feb 2021 09:34:01 GMT) Full text and rfc822 format available.

Message #50 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 20 Feb 2021 11:33:30 +0200
> From: "J.P." <jp <at> neverwas.me>
> Cc: 46342 <at> debbugs.gnu.org
> Date: Wed, 17 Feb 2021 06:59:07 -0800
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > So what is the problem with using unibyte-string for producing a
> > unibyte string from a list of bytes?  It sounds like it's exactly
> > what is needed here, and is actually used in some places in socks.el.
> 
> Great, thanks. That's what was needed indeed. I've updated things
> accordingly and have added a doc string to socks-send-command explaining
> the address parameter.

Thanks.  I wanted to install this, but it failed to apply to the
master branch.  Could you please rebase on the current master branch
and resubmit?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 20 Feb 2021 10:14:01 GMT) Full text and rfc822 format available.

Message #53 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 20 Feb 2021 02:13:27 -0800
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> Thanks.  I wanted to install this, but it failed to apply to the
> master branch.  Could you please rebase on the current master branch
> and resubmit?

Hi Eli, I've gone ahead and rebased atop

  commit c85c8e7d42ae2a5fc95fa7b14257389d8383b34d
  Author: Stefan Kangas <stefan <at> marxist.se>
  
      Add toolbar for help-mode
  
   lisp/help-mode.el | 28 +++++++++++++++++++++++-----
   1 file changed, 23 insertions(+), 5 deletions(-)

However the src/dest blob hashes (i.e., git-hash-object(1) SHAs in the
diffs' index dead..beef 0644 lines) remain unchanged. Which leads me to
think maybe the failure is related to these being two individual patches
(a "change set"). I've had trouble in the past applying similar sets
using debbugs-gnu-apply-patch, but that might've just been incompetence.

Regardless, I'd be happy to squash these down into a single commit if you
think that'd help. Thanks.
[0001-Add-more-auth-related-tests-for-socks.el.patch (text/x-patch, attachment)]
[0002-Use-raw-bytes-for-SOCKS-4-IP-addresses.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 20 Feb 2021 10:42:01 GMT) Full text and rfc822 format available.

Message #56 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 20 Feb 2021 02:41:40 -0800
[Message part 1 (text/plain, inline)]
I went ahead and squashed them down for good measure, in case you prefer
the single commit. BTW, I meant before/after, not src/dest re the git
diff index lines. Thanks.
[0001-Use-raw-bytes-for-SOCKS-4-IP-addresses.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 20 Feb 2021 11:10:02 GMT) Full text and rfc822 format available.

Message #59 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 20 Feb 2021 13:08:50 +0200
> From: "J.P." <jp <at> neverwas.me>
> Cc: 46342 <at> debbugs.gnu.org
> Date: Sat, 20 Feb 2021 02:13:27 -0800
> 
> > Thanks.  I wanted to install this, but it failed to apply to the
> > master branch.  Could you please rebase on the current master branch
> > and resubmit?
> 
> Hi Eli, I've gone ahead and rebased atop
> 
>   commit c85c8e7d42ae2a5fc95fa7b14257389d8383b34d
>   Author: Stefan Kangas <stefan <at> marxist.se>
>   
>       Add toolbar for help-mode
>   
>    lisp/help-mode.el | 28 +++++++++++++++++++++++-----
>    1 file changed, 23 insertions(+), 5 deletions(-)
> 
> However the src/dest blob hashes (i.e., git-hash-object(1) SHAs in the
> diffs' index dead..beef 0644 lines) remain unchanged. Which leads me to
> think maybe the failure is related to these being two individual patches
> (a "change set").

No, that's not the reason; the changes applied cleanly this time.  So
this is now installed on the master branch.

Running the tests, I see a lot of stuff on the console.  Do we really
need that? can some of it be shut up?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 20 Feb 2021 15:10:02 GMT) Full text and rfc822 format available.

Message #62 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: "J.P." <jp <at> neverwas.me>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 20 Feb 2021 07:08:55 -0800
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> Running the tests, I see a lot of stuff on the console.  Do we really
> need that? can some of it be shut up?

Sorry about that. I changed it so it's only noisy during interactive
sessions. Is that enough or should I go all the way? Thanks.
[0001-Mute-noisy-test-fixture-for-socks.el.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#46342; Package emacs. (Sat, 20 Feb 2021 15:20:02 GMT) Full text and rfc822 format available.

Message #65 received at 46342 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "J.P." <jp <at> neverwas.me>
Cc: 46342 <at> debbugs.gnu.org
Subject: Re: bug#46342: 28.0.50; socks-send-command munges IP address bytes
 to UTF-8
Date: Sat, 20 Feb 2021 17:19:32 +0200
> From: "J.P." <jp <at> neverwas.me>
> Cc: 46342 <at> debbugs.gnu.org
> Date: Sat, 20 Feb 2021 07:08:55 -0800
> 
> > Running the tests, I see a lot of stuff on the console.  Do we really
> > need that? can some of it be shut up?
> 
> Sorry about that. I changed it so it's only noisy during interactive
> sessions. Is that enough or should I go all the way? Thanks.

Thanks, installed.

And with that, I think we can close this bug?




Added tag(s) fixed. Request was from "J.P." <jp <at> neverwas.me> to control <at> debbugs.gnu.org. (Sun, 21 Feb 2021 06:58:01 GMT) Full text and rfc822 format available.

bug marked as fixed in version 28.1, send any further explanations to 46342 <at> debbugs.gnu.org and "J.P." <jp <at> neverwas.me> Request was from "J.P." <jp <at> neverwas.me> to control <at> debbugs.gnu.org. (Sun, 21 Feb 2021 06:58:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 21 Mar 2021 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 86 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.