GNU bug report logs - #79017
erc-join-tests runs "forever" on Solaris 10

Previous Next

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Mon, 14 Jul 2025 15:18:01 UTC

Severity: normal

Full log


View this message in rfc822 format

From: "J.P." <jp <at> neverwas.me>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, emacs-erc <at> gnu.org, 79017 <at> debbugs.gnu.org
Subject: bug#79017: erc-join-tests runs "forever" on Solaris 10
Date: Mon, 18 Aug 2025 22:36:21 -0700
Hi Paul,

"J.P." <jp <at> neverwas.me> writes:

>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>>> Date: Mon, 14 Jul 2025 08:16:32 -0700
>>> From: Paul Eggert <eggert <at> cs.ucla.edu>
>>> 
>>> This ran for about 12 hours before I killed it. Emacs is CPU bound. 
>>> "truss" reports a seemingly infinite sequence of syscalls containing the 
>>> following pattern, repeated indefinitely:
>>
>> Thanks, I've added the maintainer of ERC in the hope that he could
>> have some ideas.
>>
>
> Thanks for the detailed info. The tests in that file have remained
> unchanged since they were created four years ago. Unfortunately, they
> were never updated to use the newer fixtures and helpers ERC now uses
> for mocking connections more responsibly. FWIW, I couldn't yet manage to
> reproduce the error (https://0x0.st/8dzt.png), though it's quite
> possible I didn't build Emacs correctly. In any case, I'll at least try
> to ensure those tests fail fast in the future instead of running
> forever. For now, I've just arranged to skip the failing one.
>
> J.P.
>
>>> ...
>>> lwp_sigmask(SIG_SETMASK, 0x00000002, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
>>> lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
>>> pollsys(0xFFBFE278, 2, 0xFFBFE5D8, 0x00000000)  = 1
>>> lwp_sigmask(SIG_SETMASK, 0x00000002, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
>>> lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
>>> read(25, 0x040AB840, 65536)                     = 0
>>> timer_settime(0, 1, 0xFFBFE4B8, 0x00000000)     = 0
>>> alarm(0)                                        = 0
>>> timer_settime(0, 1, 0xFFBFE438, 0x00000000)     = 0
>>> lwp_sigmask(SIG_SETMASK, 0x00000002, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
>>> lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
>>> pollsys(0xFFBFE278, 1, 0xFFBFE5D8, 0x00000000)  = 1
>>> lwp_sigmask(SIG_SETMASK, 0x00000002, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
>>> lwp_sigmask(SIG_SETMASK, 0x00000000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
>>> ...
>>> 
>>> The file lisp/erc/erc-join-tests.log looks like this:
>>> 
>>> Running 10 tests (2025-07-14 07:46:00-0700, selector `(not (or (tag 
>>> :expensive-test) (tag :unstable) (tag :nativecomp)))')
>>>     passed   1/10  erc-autojoin-add--network (0.021089 sec)
>>>     passed   2/10  erc-autojoin-add--network-extended-syntax (0.014108 sec)
>>>     passed   3/10  erc-autojoin-add--network-id (0.013998 sec)
>>>     passed   4/10  erc-autojoin-add--server (0.012283 sec)
>>>     passed   5/10  erc-autojoin-channels--connect (0.051325 sec)
>>> 
>>> pfiles reports the following extra files open, all fifos:
>>> 
>>>     5: S_IFIFO mode:0000 dev:340,0 ino:3591989 uid:5823 gid:30 size:0
>>>        O_RDWR|O_NONBLOCK FD_CLOEXEC
>>>     6: S_IFIFO mode:0000 dev:340,0 ino:3591989 uid:5823 gid:30 size:0
>>>        O_RDWR|O_NONBLOCK FD_CLOEXEC
>>> [...]
>>>    24: S_IFIFO mode:0000 dev:340,0 ino:3592014 uid:5823 gid:30 size:0
>>>        O_RDWR|O_NONBLOCK FD_CLOEXEC
>>>    25: S_IFIFO mode:0000 dev:340,0 ino:3592015 uid:5823 gid:30 size:0
>>>        O_RDWR|O_NONBLOCK FD_CLOEXEC

There was a small bug in ERC that prevented a non-repeating timer from
being canceled:

  https://cgit.git.savannah.gnu.org/cgit/emacs.git/commit/?id=2f5fe1a4

While that timer did run in the faulty test, I still can't see how it
might have contributed to the lack of termination observed.

Regarding the pflies output, the number of FIFOs created for a

  make check LOGFILES=lisp/erc/erc-join-tests.log

on my Solaris 10 VM only increases for the first test, maxing out at six
new ones. It then remains steady, possibly because each of the three
subprocess per test requires two and they're reused when prior tests
complete. This seems to be reflected by watching

  /proc/$EMACS_PID/fd/

In hopes of cutting down on some noise, I've split the tests up and
renamed them on master so each only now runs a single subprocess. I've
also removed the `ert-skip' guard preventing the faulty part from
running on Solaris. If there's still a problem, those that might hang
(or hopefully now just fail) should be

   erc-autojoin-channels-delayed/network
   erc-autojoin-channels-delayed/nomatch
   erc-autojoin-channels-delayed/server

If you see any failures or "forever" behavior (in which case, please
just kill it), please let me know, and I'll restore the `ert-skip' guard
and try to distill the triggering code down to a few lines without any
ERC-related distractions. Open to other suggestions, of course.

Thanks,
J.P.




This bug report was last modified 31 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.