GNU bug report logs - #49449
28: TLS connection never gets to "open" stage

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Tue, 6 Jul 2021 19:42:02 UTC

Severity: normal

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 49449 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 49449 <at> debbugs.gnu.org
Subject: Re: bug#49449: 28: TLS connection never gets to "open" stage
Date: Thu, 8 Jul 2021 09:59:26 +0200
7 juli 2021 kl. 21.57 skrev Lars Ingebrigtsen <larsi <at> gnus.org>:

> Yes, it's grown somewhat organically.  :-/

Let me first say that the state of the code is not your fault! It's a product, as you say, from organic growth, and it does need a rewrite.

> I'm not able to reproduce this on Debian/bullseye, but on Macos I get
> 
> callback: status = (:error (error connection-failed "connect" :host "elpa.gnu.o\
> rg" :service 443))

Yes, that is my observation too. Obviously the busy-wait part is essential: removing it makes the problem go away.
Essentially, the busy-wait postpones the call to wait_reading_process_output so that when it is eventually called, gnutls_handshake succeeds on the first try instead of first returning GNUTLS_E_AGAIN, which brings us onto a different code path.

> There's been several reports in the last week of TLS not
> working on Macos.  Has Apple pushed something new, or...  did something
> else happen lately in this area on Macos?

No, I've been harassed by this bug for quite some time but only now decided to dig deeper. Most likely it's just a matter of different timing that the process/TLS system doesn't cope with.

First, when the `url-http` call returns we have a Lisp_Process with

 gnutls_p = true
 gnutls_boot_parameters = non-nil
 gnutls_initstage = GNUTLS_STAGE_HANDSHAKE_TRIED (8)

and its file descriptor has a corresponding fd_callback_data with
 flags = FOR_WRITE | NON_BLOCKING_CONNECT_FD

because the asynchronous connect call has not yet been completed.

In the GOOD case (without busy-wait), `wait_reading_process_output` gets called right away (because Emacs has nothing else to do) and gnutls_try_handshake initially fails with E_AGAIN but p->outfd becomes writable so `delete_write_fd` is called to zero the fd_callback_data flags, and when the handshake eventually succeeds, the sentinel is called with the "open\n" event.

In the BAD case (with busy-wait), the TLS handshake succeeds right away while the descriptor flags still has NON_BLOCKING_CONNECT_FD set, so the sentinel isn't called.

Does this jog any memories?





This bug report was last modified 3 years and 310 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.