GNU bug report logs - #40665
28.0.50; tls hang on local ssl

Previous Next

Package: emacs;

Reported by: Derek Zhou <derek <at> 3qin.us>

Date: Thu, 16 Apr 2020 16:01:02 UTC

Severity: normal

Tags: fixed

Found in version 28.0.50

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Robert Pluim <rpluim <at> gmail.com>
To: Derek Zhou <derek <at> 3qin.us>
Cc: 40665 <at> debbugs.gnu.org
Subject: bug#40665: 28.0.50; tls hang on local ssl
Date: Sun, 19 Apr 2020 16:34:38 +0200
>>>>> On Sat, 18 Apr 2020 02:44:05 +0000 (UTC), Derek Zhou <derek <at> 3qin.us> said:

    Derek> Derek Zhou writes:

    >> When this thing happens, the tls handshakes are done properly. However,
    >> emacs did not write anything into gnutls before starting to read and
    >> obviously cannot get anything out at all. It is not really a hang, but
    >> write never happen and the display buffer stays empty.
    >> 
    >> Derek

    Derek> Took my nearly the whole day to debug, but this one-line patch fixed my
    Derek> problem.
    Derek> My server finishes tls handshake within the gnutls_boot itself, and if the
    Derek> sentinel is not called right after, it will never be called so write
    Derek> will not happen. Someone should review this carefully.

    Derek> diff --git a/src/process.c b/src/process.c
    Derek> index 91d426103d..6d497ef854 100644
    Derek> --- a/src/process.c
    Derek> +++ b/src/process.c
    Derek> @@ -5937,8 +5937,7 @@ wait_reading_process_output (intmax_t time_limit, int nsecs, int read_kbd,
    Derek>  		  /* If we have an incompletely set up TLS connection,
    Derek>  		     then defer the sentinel signaling until
    Derek>  		     later. */
    Derek> -		  if (NILP (p->gnutls_boot_parameters)
    Derek> -		      && !p->gnutls_p)
    Derek> +		  if (NILP (p->gnutls_boot_parameters))
    Derek>  #endif
    Derek>  		    {
    Derek>  		      pset_status (p, Qrun);

Hereʼs what I think is happening:

The only way for p->gnutls_boot_parameters to become nil is here in
connect_network_socket:

      if (p->gnutls_initstage == GNUTLS_STAGE_READY)
        {
          p->gnutls_boot_parameters = Qnil;
	  /* Run sentinels, etc. */
	  finish_after_tls_connection (proc);
        }

and finish_after_tls_connection should call the sentinel, but
NON_BLOCKING_CONNECT_FD is still set, so it doesnʼt.

The next chance to call the sentinel would be from
wait_reading_process_output, but only if handshaking has been tried
and not completed, except it is complete already.

wait_reading_process_output then calls delete_write_fd, which clears
NON_BLOCKING_CONNECT_FD, and doesnʼt run the sentinel because
p->gnutls_boot_parameters is nil and p->gnutls_p is true

finish_after_tls_connection never gets another chance to run, since
the socket is connected and handshaking is complete.

After your change, you've fixed this case:

    if p->gnutls_boot_parameters is nil, that means the handshake
    completed already and the TLS connection is up, so
    calling the sentinel is ok.

In other cases where the handshake does not complete straight away in
Fgnutls_boot, it will complete here in wait_reading_process_output

		/* Continue TLS negotiation. */
		if (p->gnutls_initstage == GNUTLS_STAGE_HANDSHAKE_TRIED
		    && p->is_non_blocking_client)
		  {
		    gnutls_try_handshake (p);
		    p->gnutls_handshakes_tried++;

		    if (p->gnutls_initstage == GNUTLS_STAGE_READY)
		      {
			gnutls_verify_boot (aproc, Qnil);
			finish_after_tls_connection (aproc);
		      }

which always happens after delete_write_fd has been called, which
clears NON_BLOCKING_CONNECT_FD, so finish_after_tls_connection calls
the sentinel.

One change we could make is to set p->gnutls_boot_parameters to nil
here, so that in the sequence

    Fgnutls_boot, handshake does not complete
    handshake succeeds first time in wait_reading_process_output
    delete_write_fd then checks p->gnutls_boot_parameters

the sentinel ends up getting run, but Iʼve not seen the handshake ever
succeed straight away before the delete_write_fd, and if it ever has
in the wild we would have seen bug reports (and this is dragon-filled
code, so I donʼt want to make changes to it if I can help it :-))

In short: I think the change is ok. It passes the network-stream
tests, so Iʼll run with it for a while, and push it in a week or so.

Robert




This bug report was last modified 5 years and 4 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.