From debbugs-submit-bounces@debbugs.gnu.org Mon May 16 04:26:28 2022 Received: (at submit) by debbugs.gnu.org; 16 May 2022 08:26:28 +0000 Received: from localhost ([127.0.0.1]:51638 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nqW3f-0000rv-NT for submit@debbugs.gnu.org; Mon, 16 May 2022 04:26:27 -0400 Received: from lists.gnu.org ([209.51.188.17]:50136) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nqW3d-0000rl-UJ for submit@debbugs.gnu.org; Mon, 16 May 2022 04:26:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46594) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nqW3Z-0003vV-SB for bug-guix@gnu.org; Mon, 16 May 2022 04:26:25 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:37976) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nqW3Z-0001Gy-IR for bug-guix@gnu.org; Mon, 16 May 2022 04:26:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:Subject:To:From:in-reply-to: references; bh=Mw1l525OMdOZVRAYaJPDpoV5ekpOEhXSqqfa5J3dMGw=; b=BIl4hz1c/qP9w1 0MidSLF0MgJ7LvdyD3wLiEpELVHo2/5bZtziaqU9dng1F6BpxokdgJnJ8aFKDdaxJWpPm+FSqE6Ol B00Y6TypC0RrNE234dwT1gx4lpodmKHL0PabF2IbOlRicCj7rGQrvZ7Xl+IuZKBNq2K7kYA3hkPDO ehrFGLfh+JDK05znmVVRBRxzT1srSVPCPg01159TntOZjYWCRDHX8Z317As/ePY8ZuBEQ5MCxjYGw G9qCVM4ZX/i72HtaFdGbb/9fouXmkXclYLkw5z+o+uod4834OJ6kAeD5Kx+z72vbJjMiD8o4YpIGQ cOxP29SC43CuWwi8FaKg==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:49500 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nqW3V-00043I-JZ for bug-guix@gnu.org; Mon, 16 May 2022 04:26:20 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: bug-guix@gnu.org Subject: elogind startup race between shepherd and dbus-daemon X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 27 =?utf-8?Q?Flor=C3=A9al?= an 230 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 16 May 2022 10:26:15 +0200 Message-ID: <877d6lc28o.fsf@inria.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello! Currently (40a729a0e6f1d660b942241416c1e2c567616d4d), shepherd and dbus-daemon compete to start elogind: shepherd tries to start it eagerly, and dbus-daemon starts it on-demand upon bus activation. Sometimes dbus-daemon wins, and thus shepherd tries a few times to start it anyway, leading to the infamous: elogind is already running as PID 123 (elogind checks whether its PID file exists. Note that you may see that message also when shepherd wins, because dbus-daemon tries to start it anyway.) Eventually, shepherd considers that elogind cannot be started and disables it. In addition to being ridiculous, it=E2=80=99s harmful: the =E2=80=98xorg-se= rver=E2=80=99 service (from =E2=80=98gdm-service-type=E2=80=99 and =E2=80=98sddm-service-type=E2= =80=99 depends on =E2=80=98elogind=E2=80=99), so if shepherd loses the race, Xorg isn=E2=80=99t started (on my laptop, shepherd never loses the race it seems, but i=E2=80=99ve seen it lose half = of the time on a slower machine). The reason elogind is started by shepherd is explained in this comment: ;; Start elogind from the Shepherd rather than waiting ;; for bus activation. This ensures that it can handle ;; events like lid close, etc. This comes from 94a881178af9a9a918ce6de55641daa245c92e73, which was a fix for . I believe the justification still holds. So it would seem that the solution to this is to prevent dbus-daemon from starting elogind. We can do that by changing org.freedesktop.login1.service so that it has =E2=80=9CExec=3Dtrue=E2=80=9D= instead of =E2=80=9CExec=3Delogind --daemon=E2=80=9D. =E2=80=9CExec=3Dtrue=E2=80=9D is a bit crude because it doesn=E2=80=99t gua= rantee that elogind is really started; if that isn=E2=80=99t good enough, we could instead wait fo= r the PID file or something (as of Shepherd 0.9.0, invoking =E2=80=98herd start elogind=E2=80=99 potentially leads shepherd to start a second instance if t= he first one is still being started, so we can=E2=80=99t really do that). Depending on what we end up with, we might also revisit whether xorg-server needs to explicitly depend on elogind. Thoughts? Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Mon May 16 04:40:16 2022 Received: (at control) by debbugs.gnu.org; 16 May 2022 08:40:16 +0000 Received: from localhost ([127.0.0.1]:51675 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nqWH2-0001Fh-9t for submit@debbugs.gnu.org; Mon, 16 May 2022 04:40:16 -0400 Received: from eggs.gnu.org ([209.51.188.92]:56590) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nqWGz-0001FO-HQ for control@debbugs.gnu.org; Mon, 16 May 2022 04:40:14 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:38106) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nqWGu-0003k1-9g for control@debbugs.gnu.org; Mon, 16 May 2022 04:40:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:Subject:From:To:Date:in-reply-to: references; bh=QUaGMHZR1urtMe3PFRJl3bsS54azxmK0+W8KVMhPFgI=; b=N9JYgR0md9ANFb nZm6pEO6V4csPWg3DrC7vCoeft6lm3A0hK1t840SIPnKcPIlPQ6nWTEDdUiZzm+mvlFcNpy9qV7kk rp9IMQVKZK92Fjr7kkK69/HQadLt/6yIXar1kiMHzNviUgplQfQMzsy+18tzZ5xkmrHE4O45mtmzQ 7GBwokmYxEnsxHZ+OUvH8QRlbEl1O4p4VRCuR2QwX8vGuz01dMlz/hV6AXXuQRAqbJpBOMWulyurv 7eO2SGzWqEKxw57OVhH6GYqdP5/8SMzJNh0V4B9qmP/+9dX9l4Z2tXc9UMmx+Hf9KK4hzF9zhihhM GiwBVKcswPpn5xilx0DQ==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:50488 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nqWGt-00047i-Td for control@debbugs.gnu.org; Mon, 16 May 2022 04:40:08 -0400 Date: Mon, 16 May 2022 10:40:05 +0200 Message-Id: <875ym5c1lm.fsf@gnu.org> To: control@debbugs.gnu.org From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Subject: control message for bug #55444 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) severity 55444 important quit From debbugs-submit-bounces@debbugs.gnu.org Mon May 23 22:27:35 2022 Received: (at 55444) by debbugs.gnu.org; 24 May 2022 02:27:35 +0000 Received: from localhost ([127.0.0.1]:50554 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ntKGf-0003Et-Qz for submit@debbugs.gnu.org; Mon, 23 May 2022 22:27:35 -0400 Received: from mail-qv1-f45.google.com ([209.85.219.45]:33696) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ntKGe-0003Eg-2A for 55444@debbugs.gnu.org; Mon, 23 May 2022 22:27:28 -0400 Received: by mail-qv1-f45.google.com with SMTP id j3so13444968qvn.0 for <55444@debbugs.gnu.org>; Mon, 23 May 2022 19:27:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=1tLRQpoc992jlCtVz7ebfrUTJrXqky181uW6QKe65AE=; b=Z113bcH4fOZm34AlmANKjZFFjwJoMZgprPfM1npQpXBYjq2OeLRFomT7lGyLZjCOoD oMpflo7hBOMb3u7EvS15uqAkzt6+Mms6vz0CB07eYwFnEGKbFAE0wnT9CP5PULYllOy1 P9U9Vh4IEOi2hYSUDmDURMxfg0JU3p0i0lAO24v8FHKsi1gp5VWSG0JrEj/qIAxm7Nwa TU2P+UJoWG/nrEXB0MgjbWwoWz/PLXKH2LTGSU/aHzTAd1PTZjawgGEWbkFNDBDRCN88 O8cg0jQVyM9i5vY/l5ygFR2JDdMg0B6nyFxEJ8KWhuejUC1/9NJgEx0JwvPV1MljtNz6 0YKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=1tLRQpoc992jlCtVz7ebfrUTJrXqky181uW6QKe65AE=; b=FuM4D3VWXbuRSvJjsC4zh8v4+N2OntAXfTT82RTcb2p5+tczrFflNrRIewuwe8of1o 60mqFpMf6JNFCjgZN+cVa766pJSWUl3KrNnGc0qAbZw8SHWXeYrn5E0OsaikBDnp7uxw Um0RPCxdJjv78c2m1wBc2kkzO12SIY1Y+0e/KXJdAKhtdxwgpM5ClYwJqMyLnz5ZTCr1 CY96RQ448o6bhwNXoQ/MS3eAGLHoxatFpl9FxjZyjVxjGOvNO6qb6w8bzA+5Yw30H9BX /nTLFCwRMFSqhs841fAGZXSopd1AExSygdZh+Neb1bYkzkLBxwVtyfO2kTFY/wSsI0Pq Mw6A== X-Gm-Message-State: AOAM532zC5bEtIpe5ygldgyuh45V+LUydUjb5ymjwig1wNqZS7U46F5s gxVaIHbcfN4X81/gmVUSPKOF8IjRBx+i2Q== X-Google-Smtp-Source: ABdhPJx5sUvnig41MUJdlpc7C1yWgOtQF7cbv0VlECEV6tmpRVewvtuxMaZx9en4bPFjEpCGZykD9w== X-Received: by 2002:a0c:f0c3:0:b0:461:d824:c66e with SMTP id d3-20020a0cf0c3000000b00461d824c66emr19632296qvl.111.1653359242065; Mon, 23 May 2022 19:27:22 -0700 (PDT) Received: from hurd (dsl-157-33.b2b2c.ca. [66.158.157.33]) by smtp.gmail.com with ESMTPSA id e18-20020ac84b52000000b002f93e856eccsm1157904qts.32.2022.05.23.19.27.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 19:27:21 -0700 (PDT) From: Maxim Cournoyer To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#55444: elogind startup race between shepherd and dbus-daemon References: <877d6lc28o.fsf@inria.fr> Date: Mon, 23 May 2022 22:27:20 -0400 In-Reply-To: <877d6lc28o.fsf@inria.fr> ("Ludovic =?utf-8?Q?Court=C3=A8s=22?= =?utf-8?Q?'s?= message of "Mon, 16 May 2022 10:26:15 +0200") Message-ID: <87y1yr4qd3.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 55444 Cc: 55444@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi, Ludovic Court=C3=A8s writes: > Hello! > > Currently (40a729a0e6f1d660b942241416c1e2c567616d4d), shepherd and > dbus-daemon compete to start elogind: shepherd tries to start it > eagerly, and dbus-daemon starts it on-demand upon bus activation. > > Sometimes dbus-daemon wins, and thus shepherd tries a few times to start > it anyway, leading to the infamous: > > elogind is already running as PID 123 Do we have a system test that sometimes reproduce it, or at least the above message? I have some branch where I introduce some D-Bus synchronization primitives I had started to fix https://issues.guix.gnu.org/issue/52051, which ended up being fixed differently (bumping the timeout value); perhaps it could be of use here. Thanks, Maxim From debbugs-submit-bounces@debbugs.gnu.org Tue May 24 15:26:06 2022 Received: (at 55444) by debbugs.gnu.org; 24 May 2022 19:26:06 +0000 Received: from localhost ([127.0.0.1]:53456 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ntaAQ-0005MG-Az for submit@debbugs.gnu.org; Tue, 24 May 2022 15:26:06 -0400 Received: from mail-ej1-f65.google.com ([209.85.218.65]:38648) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ntaAP-0005Lm-0d for 55444@debbugs.gnu.org; Tue, 24 May 2022 15:26:05 -0400 Received: by mail-ej1-f65.google.com with SMTP id n10so37348495ejk.5 for <55444@debbugs.gnu.org>; Tue, 24 May 2022 12:26:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:subject:from:to:date:in-reply-to:references:user-agent :mime-version:content-transfer-encoding; bh=Kko55CzIKGNWNcrzd4gDD6y2nfciWDga3W/en6GFZkM=; b=U0aU8BWxCnYeDNq/xo9N9ODuKOkQADXYgLSnQIEdDH+PRRCnaXILv3d3yMgTs+Dlxb Xy2QWhoyq9r9NjJZvG7G7OWfN2lF0SfCQMpcAFG6vu+bXlPCXdj9Wu/Mn06CMyYSsEkm rw4Q6cC04pZ8JrCLjpwJActIuSwZGji57S73bU8wHB/SretgouX2AcrYN5EpdGhzfuIF zYLrQhJEDGtAKY81gBu0Us1pzuCiMBvktRWCi2arqB5/w9PweqohVz+65vP4PEDp/9Hl VbSyoML6mAt7GFtPUN2lW5d2zDRIqsrmU/V3r6/08Kg3Xdv1vb0rAx9/DONX6XFx2vUY xpXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=Kko55CzIKGNWNcrzd4gDD6y2nfciWDga3W/en6GFZkM=; b=7hMwd0afsc9J48GB45Dr9U2xEz11SZWFoxQbP4C7i8h/bPcnC69ZoNh5YfLk+5yMu9 gIZ6HesVrK0rn4kxntxldLKP06uukN3HI+fKiXlqYlMOI+9sEedpt5G+UTPzakQI9zwW tuMgrPEjSyJHycZ1SpCR62noM9c5s9CGwSjGvj9L08YaPlr3dFM9lFBH4STma1LTNarS y4lqaeZJwYSY1fVd1wwClRNXF3+kalIScbcLsV8+ytXiIEgTublihbz0PTLzv9nxif7a 0A7l/4a9fERVGuZkjEpFe99AW5BYqnd+C0c04pxOk5LuEMuvo2oYbRkzz+KJleO3Dda1 PwPA== X-Gm-Message-State: AOAM530jHM0+eHylDW7QfLu13iQC7J0Ry14AOcJdrlOU8Cha7J3Nr+5c Bt3mXnmajzwfp2EOMbOtn9Y= X-Google-Smtp-Source: ABdhPJzVGFZPIMB9UbXdJEj94LCRKeEGT75iSuBvm5fjSRTlk4O3GdxBacTuyza2GPnod/GYE5nUTg== X-Received: by 2002:a17:907:7da5:b0:6fe:d818:ee49 with SMTP id oz37-20020a1709077da500b006fed818ee49mr11948278ejc.58.1653420358845; Tue, 24 May 2022 12:25:58 -0700 (PDT) Received: from nijino.fritz.box (85-127-52-93.dsl.dynamic.surfer.at. [85.127.52.93]) by smtp.gmail.com with ESMTPSA id p4-20020a17090664c400b006fea0532462sm5644176ejn.167.2022.05.24.12.25.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 May 2022 12:25:58 -0700 (PDT) Message-ID: <5a8b5f20da2676e03c696e33c029895292ed397e.camel@gmail.com> Subject: Re: elogind startup race between shepherd and dbus-daemon From: Liliana Marie Prikler To: Ludovic =?ISO-8859-1?Q?Court=E8s?= , 55444@debbugs.gnu.org Date: Tue, 24 May 2022 21:25:57 +0200 In-Reply-To: <877d6lc28o.fsf@inria.fr> References: <877d6lc28o.fsf@inria.fr> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 55444 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi, Am Montag, dem 16.05.2022 um 10:26 +0200 schrieb Ludovic Courtès: > [...] > So it would seem that the solution to this is to prevent dbus-daemon > from starting elogind.  We can do that by changing > org.freedesktop.login1.service so that it has “Exec=true” instead of > “Exec=elogind --daemon”. > > “Exec=true” is a bit crude because it doesn’t guarantee that elogind > is really started; if that isn’t good enough, we could instead wait > for the PID file or something (as of Shepherd 0.9.0, invoking ‘herd > start elogind’ potentially leads shepherd to start a second instance > if the first one is still being started, so we can’t really do that). Why does shepherd race with itself here? That sounds like a very evil bug. Rather than waiting for a log file, I'd suggest writing an ad-hoc Guile script that communicates with shepherd and blocks until shepherd signals that elogind has been started, but this script too would have to work around shepherd racing against itself. > Depending on what we end up with, we might also revisit whether > xorg-server needs to explicitly depend on elogind. At least in the case of GDM I think it does heavily depend on elogind. For the future, I think we also should take over dbus-daemon's autostart in the same way systemd already has. Cheers From debbugs-submit-bounces@debbugs.gnu.org Wed May 25 08:26:32 2022 Received: (at 55444) by debbugs.gnu.org; 25 May 2022 12:26:32 +0000 Received: from localhost ([127.0.0.1]:54571 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ntq5w-0000ja-4z for submit@debbugs.gnu.org; Wed, 25 May 2022 08:26:32 -0400 Received: from eggs.gnu.org ([209.51.188.92]:47458) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ntq5o-0000jF-Jg for 55444@debbugs.gnu.org; Wed, 25 May 2022 08:26:30 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:60164) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ntq5i-00062q-1H; Wed, 25 May 2022 08:26:19 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=KJKP7UiqU8UzaVkabqZ9zVaZVPw9ql4C+NfB/YmjN8c=; b=rdww5oFwX+vCaBSclSUh 8RSQFUP5UnFNUOi4454e6kv0S+yCL5mJbiydJLUUynRnzbqR0vDaJtIy5QTpbgbQy/QEVYpg7Kz0l zQE52AKzdvQ1t5Hd7H7TuLS2ljGTKTqZDthpBVPAXG0qix8Hv4r0Yr/yKXzR9/MWcZQipW8tOAJRg 54hNmZ+HyUJjlGN0Kxlkop9ZEWOlKUwA4PWb9ODNOWKwmzA8HTqfrfYj8tO5bH9k4nG88j7M06M1s pSQlEyUvv6x03Lef9FD48us9NwUskQxt3zPsBvtShv8ulNB01WDtDftHqaa13rXO/vGzBdX2KW6iA VJBYBHLQNsV19g==; Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=47620 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ntq5f-0003QG-Cr; Wed, 25 May 2022 08:26:17 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Liliana Marie Prikler Subject: Re: bug#55444: elogind startup race between shepherd and dbus-daemon References: <877d6lc28o.fsf@inria.fr> <5a8b5f20da2676e03c696e33c029895292ed397e.camel@gmail.com> Date: Wed, 25 May 2022 14:26:13 +0200 In-Reply-To: <5a8b5f20da2676e03c696e33c029895292ed397e.camel@gmail.com> (Liliana Marie Prikler's message of "Tue, 24 May 2022 21:25:57 +0200") Message-ID: <87zgj5hk7u.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 55444 Cc: 55444@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Liliana Marie Prikler skribis: > Am Montag, dem 16.05.2022 um 10:26 +0200 schrieb Ludovic Court=C3=A8s: >> [...] >> So it would seem that the solution to this is to prevent dbus-daemon >> from starting elogind.=C2=A0 We can do that by changing >> org.freedesktop.login1.service so that it has =E2=80=9CExec=3Dtrue=E2=80= =9D instead of >> =E2=80=9CExec=3Delogind --daemon=E2=80=9D. >>=20 >> =E2=80=9CExec=3Dtrue=E2=80=9D is a bit crude because it doesn=E2=80=99t = guarantee that elogind >> is really started; if that isn=E2=80=99t good enough, we could instead w= ait >> for the PID file or something (as of Shepherd 0.9.0, invoking =E2=80=98h= erd >> start elogind=E2=80=99 potentially leads shepherd to start a second inst= ance >> if the first one is still being started, so we can=E2=80=99t really do t= hat). > Why does shepherd race with itself here? That sounds like a very evil > bug. Rather than waiting for a log file, I'd suggest writing an ad-hoc > Guile script that communicates with shepherd and blocks until shepherd > signals that elogind has been started, but this script too would have > to work around shepherd racing against itself. Right. Currently services have two states: stopped, and started. Fixing that needs non-trivial changes to how shepherd handles state. We=E2=80=99ll have to do that (the way I see it, we=E2=80=99ll move state o= ut of and have a fiber explicitly handle state, including distinguishing between =E2=80=9Cstopped=E2=80=9D and =E2=80=9Cstarting=E2= =80=9D), but I think/hope we can fix this bug without first addressing this issue. >> Depending on what we end up with, we might also revisit whether >> xorg-server needs to explicitly depend on elogind. > At least in the case of GDM I think it does heavily depend on elogind. > For the future, I think we also should take over dbus-daemon's > autostart in the same way systemd already has. Agreed, though that one is trickier: we=E2=80=99d need an implementation of= the D-Bus protocol. There=E2=80=99s guile-ac-d-bus but it=E2=80=99s probably u= nder-tested. Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed May 25 08:46:01 2022 Received: (at 55444) by debbugs.gnu.org; 25 May 2022 12:46:01 +0000 Received: from localhost ([127.0.0.1]:54598 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ntqOn-0003U1-4c for submit@debbugs.gnu.org; Wed, 25 May 2022 08:46:01 -0400 Received: from eggs.gnu.org ([209.51.188.92]:50802) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ntqOk-0003Tg-Sb for 55444@debbugs.gnu.org; Wed, 25 May 2022 08:46:00 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:60368) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ntqOf-0000xW-Ih; Wed, 25 May 2022 08:45:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=zCUD5F7EfJW4GfmL5ofzqeS1oLyBn1yELFJeNk4q4Gg=; b=oFk0WIPL0o8PvVk4TFtz MyxVC4KjmASHLPRKoCrf06fPYIuUQvH+2WSK1OjHJGJz8aBrtUiMcFUFsFhJzaHKg2Bf0RjYBwNQp YZwsl+93hAjLgS+FrLK5fqElQlxwUvH4VvBFSgEwXy4oxYj/pr6CeQt+OGKfswz12+E14yVDsCXqB eRiM/spINc1WiPARfpE8t7Ky5q+TUmt6Bf8CHXMrmCSJPq4EYBn0mKtPPtaOugDqbs4EgFDDMx1cD ykBCoMFDsU+VY2F20mqv67e+cKB7aAdpPQQ5a5iYn3v5rK3Pu4ONrDzqUYb/IcEWn9OPrE2vI7dCj A1ibWJ5QU8SyPA==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:57482 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ntqOf-0003li-2t; Wed, 25 May 2022 08:45:53 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Maxim Cournoyer Subject: Re: bug#55444: elogind startup race between shepherd and dbus-daemon References: <877d6lc28o.fsf@inria.fr> <87y1yr4qd3.fsf@gmail.com> Date: Wed, 25 May 2022 14:45:50 +0200 In-Reply-To: <87y1yr4qd3.fsf@gmail.com> (Maxim Cournoyer's message of "Mon, 23 May 2022 22:27:20 -0400") Message-ID: <87tu9dhjb5.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 55444 Cc: 55444@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Maxim Cournoyer skribis: > Ludovic Court=C3=A8s writes: > >> Hello! >> >> Currently (40a729a0e6f1d660b942241416c1e2c567616d4d), shepherd and >> dbus-daemon compete to start elogind: shepherd tries to start it >> eagerly, and dbus-daemon starts it on-demand upon bus activation. >> >> Sometimes dbus-daemon wins, and thus shepherd tries a few times to start >> it anyway, leading to the infamous: >> >> elogind is already running as PID 123 > > Do we have a system test that sometimes reproduce it, or at least the > above message? Any system along the lines of gnu/system/examples/desktop.tmpl gets this message, though usually Xorg starts without any problem. There=E2=80=99s an elogind system test but it doesn=E2=80=99t catch the pro= blem because it=E2=80=99s a non-deterministic and rather rare issue. > I have some branch where I introduce some D-Bus synchronization > primitives I had started to fix > https://issues.guix.gnu.org/issue/52051, which ended up being fixed > differently (bumping the timeout value); perhaps it could be of use > here. Hmm I had forgotten about that bug, and I wonder if it=E2=80=99s the same b= ug I was seeing, and that in fact the shepherd/dbus-daemon race isn=E2=80=99t the root cause. The machine where I saw elogind startup failures was extremely slow to start for other reasons. Anyhow, getting rid of this race seems like the right thing to do. Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Fri May 27 09:54:56 2022 Received: (at 55444) by debbugs.gnu.org; 27 May 2022 13:54:56 +0000 Received: from localhost ([127.0.0.1]:33217 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nuaQa-0003C2-7r for submit@debbugs.gnu.org; Fri, 27 May 2022 09:54:56 -0400 Received: from eggs.gnu.org ([209.51.188.92]:44372) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nuaQX-0003Bn-8n for 55444@debbugs.gnu.org; Fri, 27 May 2022 09:54:54 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:57114) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nuaQR-0000He-8G for 55444@debbugs.gnu.org; Fri, 27 May 2022 09:54:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=Nh5zL5YL3nYlA7DGe/cJcjDgEe4J28zeawkl9anFApM=; b=PoY8Z4VOyhPSSNnSuPe9 FTnS+sPvZMJn+k/8b3IailIdqn4wS39gHtOGgNJWPqxxlZk7Wosj7kMYHyxa0k3vDPi9N/li89xDr /ZcKnA23sXlChkph6nQq+Z7m72u5EHc3Qqjapn96/WGBb4SHNqwo5/IO8jtzHuHWi0W151r8kHbpb 0qxrj9/7Cz9zRntg/e1ZGUMAgJkB8PP4zGqSkGvoFjBd3lBfegInBj6XGgjDTqJZ1sEwIzaQUAeKS oWg0bZRzEe70dGdXyjsMd9RaFNkYDwhduLSY/32y6JaO3+os5/990gbaBfFCUQDzjhT64A15CzpFT pIvwO5F2gjFmHg==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:60856 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nuaQQ-0002EC-Rz for 55444@debbugs.gnu.org; Fri, 27 May 2022 09:54:47 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: 55444@debbugs.gnu.org Subject: Re: bug#55444: elogind startup race between shepherd and dbus-daemon References: <877d6lc28o.fsf@inria.fr> Date: Fri, 27 May 2022 15:54:44 +0200 In-Reply-To: <877d6lc28o.fsf@inria.fr> ("Ludovic =?utf-8?Q?Court=C3=A8s=22?= =?utf-8?Q?'s?= message of "Mon, 16 May 2022 10:26:15 +0200") Message-ID: <87k0a7dqsb.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 55444 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Ludovic Court=C3=A8s skribis: > Currently (40a729a0e6f1d660b942241416c1e2c567616d4d), shepherd and > dbus-daemon compete to start elogind: shepherd tries to start it > eagerly, and dbus-daemon starts it on-demand upon bus activation. > > Sometimes dbus-daemon wins, and thus shepherd tries a few times to start > it anyway, leading to the infamous: Here=E2=80=99s an example where dbus-daemon wins: --8<---------------cut here---------------start------------->8--- $ sudo grep --color -E '^May 27 .*(dbus.*login|elogind)' /var/log/messages May 27 15:06:36 localhost dbus-daemon[326]: [system] Activating service nam= e=3D'org.freedesktop.login1' requested by ':1.0' (uid=3D0 pid=3D307 comm=3D= "/gnu/store/2lis8khrdk0zzjzs5ydi8rs5h6f6wjr7-shadow") (using servicehelper) May 27 15:06:37 localhost elogind-daemon[335]: New seat seat0. May 27 15:06:37 localhost elogind-daemon[335]: Watching system buttons on /= dev/input/event2 (Power Button) May 27 15:06:37 localhost elogind-daemon[335]: Watching system buttons on /= dev/input/event0 (Power Button) May 27 15:06:37 localhost elogind-daemon[335]: Watching system buttons on /= dev/input/event1 (Sleep Button) May 27 15:06:41 localhost elogind-daemon[335]: Watching system buttons on /= dev/input/event4 (Darfon HP USB Keyboard) May 27 15:06:43 localhost elogind-daemon[335]: Watching system buttons on /= dev/input/event5 (Darfon HP USB Keyboard Consumer Control) May 27 15:06:48 localhost dbus-daemon[326]: [system] Successfully activated= service 'org.freedesktop.login1' May 27 15:06:54 localhost elogind-daemon[335]: New session c1 of user ludo. May 27 15:07:45 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:08:02 localhost shepherd[1]: Respawning elogind.=20 May 27 15:08:14 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:08:32 localhost shepherd[1]: Respawning elogind.=20 May 27 15:08:43 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:08:47 localhost vmunix: [ 25.123255] shepherd[1]: Service file-= system-/sys/fs/cgroup/elogind has been started. May 27 15:08:53 localhost shepherd[1]: Respawning elogind.=20 May 27 15:08:59 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:09:00 localhost vmunix: [ 79.976531] elogind[348]: elogind is a= lready running as PID 335 May 27 15:09:00 localhost vmunix: [ 107.880971] elogind[364]: elogind is a= lready running as PID 335 May 27 15:09:00 localhost vmunix: [ 109.160864] elogind-daemon[335]: New s= ession c2 of user ludo. May 27 15:09:00 localhost vmunix: [ 135.017068] elogind[369]: elogind is a= lready running as PID 335 May 27 15:09:00 localhost vmunix: [ 159.849027] elogind[370]: elogind is a= lready running as PID 335 May 27 15:09:00 localhost vmunix: [ 181.608889] elogind[371]: elogind is a= lready running as PID 335 May 27 15:09:00 localhost shepherd[1]: Respawning elogind.=20 May 27 15:09:00 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:09:04 localhost shepherd[1]: Respawning elogind.=20 May 27 15:09:04 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:09:04 localhost elogind[410]: elogind is already running as PID 3= 35 May 27 15:09:04 localhost shepherd[1]: Respawning elogind.=20 May 27 15:09:04 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:09:05 localhost elogind[411]: elogind is already running as PID 3= 35 May 27 15:09:05 localhost shepherd[1]: Respawning elogind.=20 May 27 15:09:05 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:09:05 localhost elogind[412]: elogind is already running as PID 3= 35 May 27 15:09:05 localhost shepherd[1]: Respawning elogind.=20 May 27 15:09:05 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:09:05 localhost elogind[416]: elogind is already running as PID 3= 35 May 27 15:09:05 localhost shepherd[1]: Respawning elogind.=20 May 27 15:09:05 localhost shepherd[1]: Service elogind has been started.=20 May 27 15:09:05 localhost elogind[417]: elogind is already running as PID 3= 35 May 27 15:09:05 localhost shepherd[1]: Service elogind has been disabled.=20 May 27 15:09:08 localhost elogind-daemon[335]: New session c3 of user gdm. May 27 15:12:08 localhost elogind-daemon[335]: New session c4 of user ludo. --8<---------------cut here---------------end--------------->8--- (In this case =E2=80=98xorg-server=E2=80=99 started but =E2=80=98herd statu= s elogind=E2=80=99 shows it is stopped and disabled.) Contrast with a successful startup where shepherd wins: --8<---------------cut here---------------start------------->8--- May 27 11:03:55 localhost shepherd[1]: Service elogind has been started.=20 May 27 11:03:54 localhost elogind[476]: New seat seat0. May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/inp= ut/event3 (Power Button) May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/inp= ut/event1 (Power Button) May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/inp= ut/event0 (Lid Switch) May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/inp= ut/event2 (Sleep Button) May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/inp= ut/event5 (Dell Dell USB Keyboard) May 27 11:03:54 localhost dbus-daemon[470]: [system] Activating service nam= e=3D'org.freedesktop.login1' requested by ':1.2' (uid=3D0 pid=3D477 comm=3D= "/gnu/store/qpaw2f734zlsq153fkn5afcv4k4fk63z-upower") (using servicehelper) May 27 11:03:54 localhost elogind[476]: Watching system buttons on /dev/inp= ut/event4 (AT Translated Set 2 keyboard) May 27 11:03:55 localhost elogind[496]: elogind is already running as PID 4= 76 May 27 11:03:55 localhost dbus-daemon[470]: [system] Successfully activated= service 'org.freedesktop.login1' May 27 11:04:03 localhost vmunix: [ 2089.808033] shepherd[1]: Service file-= system-/sys/fs/cgroup/elogind has been started. May 27 11:04:04 localhost elogind[476]: New session c1 of user gdm. --8<---------------cut here---------------end--------------->8--- Besides, Gentoo recommends starting it from runit rather than on-demand: https://wiki.gentoo.org/wiki/Elogind#Service https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-auth/elogind/elogind-2= 41.1.ebuild?id=3Dbea47cee314829edbb41453d1e89fa1d1d3f9993 They don=E2=80=99t seem to be doing anything to avoid the race though. Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Fri May 27 16:55:00 2022 Received: (at 55444) by debbugs.gnu.org; 27 May 2022 20:55:00 +0000 Received: from localhost ([127.0.0.1]:35863 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nugz5-0008E6-QS for submit@debbugs.gnu.org; Fri, 27 May 2022 16:55:00 -0400 Received: from eggs.gnu.org ([209.51.188.92]:51604) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nugz3-0008Ds-ND for 55444@debbugs.gnu.org; Fri, 27 May 2022 16:54:58 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:39562) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nugyy-00048U-8m for 55444@debbugs.gnu.org; Fri, 27 May 2022 16:54:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=SXqEvI1r3wW6fjldWZZTlH6jgTZDXsl9yjw9gbm7LNY=; b=fqfg/IZ70nxDa6WISOd5 amoMR5eY8dNhroeQkmYUlS37UT4icP3yXgj9HpvHFWTcmlPk925VQ8pqB1XaUhUY1HsPkJLl8SyT+ kQUJZpZmDU29TITWm3YctfHj6OOavdVNf7wg+iRPZjHAbUv9wX1O3bLkXbr0CiHiUyhRIxOY86XFd dPlBqDKqfendhe4NZLlqnAe1c5mqTJpfkqURofdnToGrBRcjHEiTooLE4MjOpOT5cwE0RYqQ75xeE sIpBw3z+E0tXEyzFklm/RUCp92Ae6K/UK1M22ZpF6cGxKSJv1vFjjWFqonRO/2KUnMclwbbEOAhX3 k3fr+pML2hrVxw==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:58234 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nugyx-0008RU-Sr for 55444@debbugs.gnu.org; Fri, 27 May 2022 16:54:52 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: 55444@debbugs.gnu.org Subject: Re: bug#55444: elogind startup race between shepherd and dbus-daemon References: <877d6lc28o.fsf@inria.fr> Date: Fri, 27 May 2022 22:54:49 +0200 In-Reply-To: <877d6lc28o.fsf@inria.fr> ("Ludovic =?utf-8?Q?Court=C3=A8s=22?= =?utf-8?Q?'s?= message of "Mon, 16 May 2022 10:26:15 +0200") Message-ID: <878rqmelwm.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 55444 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hey there, Ludovic Court=C3=A8s skribis: > So it would seem that the solution to this is to prevent dbus-daemon > from starting elogind. We can do that by changing > org.freedesktop.login1.service so that it has =E2=80=9CExec=3Dtrue=E2=80= =9D instead of > =E2=80=9CExec=3Delogind --daemon=E2=80=9D. > > =E2=80=9CExec=3Dtrue=E2=80=9D is a bit crude because it doesn=E2=80=99t g= uarantee that elogind is > really started; if that isn=E2=80=99t good enough, we could instead wait = for the > PID file or something (as of Shepherd 0.9.0, invoking =E2=80=98herd start > elogind=E2=80=99 potentially leads shepherd to start a second instance if= the > first one is still being started, so we can=E2=80=99t really do that). The patch below address that: it changes the =E2=80=9CExec=3D=E2=80=9D line= of =E2=80=98org.freedesktop.login1=E2=80=99 to refer to a wrapper. That wrapp= er connects to shepherd and waits until =E2=80=98elogind=E2=80=99 is started. That way, if dbus-daemon comes first, it won=E2=80=99t actually launch anyt= hing and instead wait for the Shepherd =E2=80=98elogind=E2=80=99 service to be u= p. (And if it comes second, dbus-daemon won=E2=80=99t try to launch anything, so no spurious =E2=80=9Calready running=E2=80=9D messages.) I tested it in a =E2=80=98desktop.tmpl=E2=80=99 VM, quickly logging in on t= ty1. On /var/log/messages, you can see the =E2=80=9CActivating =E2=80=A6.login1=E2= =80=9D message from dbus-daemon, followed by =E2=80=9CService elogind started=E2=80=9D from she= pherd, followed by =E2=80=9CSuccessfully activated =E2=80=A6.login1=E2=80=9D from = dbus-daemon. The =E2=80=9Celogind=E2=80=9D system test passes too. Thoughts? Objections? Ludo=E2=80=99. --=-=-= Content-Type: text/x-patch Content-Disposition: inline; filename=0001-services-elogind-When-started-by-dbus-daemon-wait-fo.patch Content-Description: the patch >From 7ef63d7426677961afd2bd937af19b08209c5b70 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= Date: Fri, 27 May 2022 22:41:55 +0200 Subject: [PATCH] services: elogind: When started by dbus-daemon, wait for the Shepherd service. Fixes . Previously shepherd and dbus-daemon would race to start elogind. In some cases (for instance if one logs in quickly enough on the tty), dbus-daemon would "win" and start elogind before shepherd has had a chance to do it. Consequently, shepherd would fail to start elogind and mark it as stopped and disabled, in turn preventing services that depend on it such as 'xorg-server' from starting. * gnu/services/desktop.scm (elogind-dbus-service): Rewrite to refer to a wrapper that waits for the 'elogind' Shepherd service. --- gnu/services/desktop.scm | 79 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 75 insertions(+), 4 deletions(-) diff --git a/gnu/services/desktop.scm b/gnu/services/desktop.scm index 24fd43a207..318107a2ca 100644 --- a/gnu/services/desktop.scm +++ b/gnu/services/desktop.scm @@ -1075,10 +1075,81 @@ (define-syntax-rule (ini-file config file clause ...) ("HybridSleepMode" (sleep-list elogind-hybrid-sleep-mode)))) (define (elogind-dbus-service config) - (list (wrapped-dbus-service (elogind-package config) - "libexec/elogind/elogind" - `(("ELOGIND_CONF_FILE" - ,(elogind-configuration-file config)))))) + "Return a @file{org.freedesktop.login1.service} file that tells D-Bus how to +\"start\" elogind. In practice though, our elogind is started when booting by +shepherd. Thus, the @code{Exec} line of this @file{.service} file does not +explain how to start elogind; instead, it spawns a wrapper that waits for the +@code{elogind} shepherd service. This avoids a race condition where both +@command{shepherd} and @command{dbus-daemon} would attempt to start elogind." + ;; For more info on the elogind startup race, see + ;; . + + (define elogind + (elogind-package config)) + + (define wrapper + (program-file "elogind-dbus-shepherd-sync" + (with-imported-modules '((gnu services herd)) + #~(begin + (use-modules (gnu services herd) + (srfi srfi-1) + (ice-9 match)) + + (define (elogind-service? service) + (memq 'elogind (live-service-provision service))) + + (define max-attempts + ;; Number of attempts before assuming elogind failed + ;; to start. + 20) + + ;; Repeatedly check whether the 'elogind' shepherd + ;; service is up and running. (As of Shepherd 0.9.1, + ;; we cannot just call the 'start' method and wait for + ;; it: it would spawn an additional elogind process.) + (let loop ((attempts 0)) + (define services + (current-services)) + + (when (>= attempts max-attempts) + (format (current-error-port) + "elogind shepherd service not started~%") + (exit 2)) + + (match (find elogind-service? services) + (#f + (format (current-error-port) + "no elogind shepherd service~%") + (exit 1)) + (service + (unless (live-service-running service) + (sleep 1) + (loop (+ attempts 1)))))))))) + + (define build + (with-imported-modules '((guix build utils)) + #~(begin + (use-modules (guix build utils) + (ice-9 match)) + + (define service-directory + "/share/dbus-1/system-services") + + (mkdir-p (dirname (string-append #$output service-directory))) + (copy-recursively (string-append #$elogind service-directory) + (string-append #$output service-directory)) + (symlink (string-append #$elogind "/etc") ;for etc/dbus-1 + (string-append #$output "/etc")) + + ;; Replace the "Exec=" line of the 'org.freedesktop.login1.service' + ;; file with one that refers to WRAPPER instead of elogind. + (match (find-files #$output "\\.service$") + ((file) + (substitute* file + (("Exec[[:blank:]]*=.*" _) + (string-append "Exec=" #$wrapper "\n")))))))) + + (list (computed-file "elogind-dbus-service-wrapper" build))) (define (pam-extension-procedure config) "Return an extension for PAM-ROOT-SERVICE-TYPE that ensures that all the PAM -- 2.36.0 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sat May 28 04:13:35 2022 Received: (at 55444) by debbugs.gnu.org; 28 May 2022 08:13:35 +0000 Received: from localhost ([127.0.0.1]:36374 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nurZn-0006c5-9o for submit@debbugs.gnu.org; Sat, 28 May 2022 04:13:35 -0400 Received: from jpoiret.xyz ([206.189.101.64]:51470) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nurZh-0006bt-Ts for 55444@debbugs.gnu.org; Sat, 28 May 2022 04:13:33 -0400 Received: from authenticated-user (jpoiret.xyz [206.189.101.64]) by jpoiret.xyz (Postfix) with ESMTPA id 9D1051851D2; Sat, 28 May 2022 08:13:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jpoiret.xyz; s=dkim; t=1653725608; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iW4vpRQQdadF0RM/pqy4HTWtSBQjtwtVlDv4s0h3uCc=; b=M/UgrtbaU3vBzW2dGUmvW019EES/AMk92mvREp0NuyzhA6X2ZyoDW6JdCaYHNVjeflY6P7 f9hGJWeHFuwIyw6P0IFq/bo1b24nnp48aTW/8wRGop7CUPbEbyeK+41EAXGajwl3L4wKgS D1peTbDXuGlUNqZVYw/7+z9D59YKaqhh3PMfM3gthErOKwPZaRy0ffIeHXtvJOa/wkPU+s TCnYuCTyBOxsDwA9Zrgva2M2mLYXR/m2b7FZLv+LK7alcJQpj7nS+ChubgTS4iYyd0kI0e LH0tBIjVT8JUjdhXENODQU8ie76C2dANU3Fx9EolNCdAOu7boKSe4H6usvjIUg== From: Josselin Poiret To: Ludovic =?utf-8?Q?Court=C3=A8s?= , 55444@debbugs.gnu.org Subject: Re: bug#55444: elogind startup race between shepherd and dbus-daemon In-Reply-To: <878rqmelwm.fsf@gnu.org> References: <877d6lc28o.fsf@inria.fr> <878rqmelwm.fsf@gnu.org> Date: Sat, 28 May 2022 10:13:26 +0200 Message-ID: <87leumgjmh.fsf@jpoiret.xyz> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Authentication-Results: jpoiret.xyz; auth=pass smtp.auth=jpoiret@jpoiret.xyz smtp.mailfrom=dev@jpoiret.xyz X-Spamd-Bar: / X-Spam-Score: 2.1 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hello Ludo, Ludovic Courtès writes: > The patch below address that: it changes the “Exec=” line of > ‘org.freedesktop.login1’ to refer to a wrapper. That wrapper connects > to shepherd and waits until ‘elogind’ is started. > [...] Content analysis details: (2.1 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 1.6 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: jpoiret.xyz (xyz)] 0.5 FROM_SUSPICIOUS_NTLD From abused NTLD -0.0 T_SCC_BODY_TEXT_LINE No description available. X-Debbugs-Envelope-To: 55444 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 2.1 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hello Ludo, Ludovic Courtès writes: > The patch below address that: it changes the “Exec=” line of > ‘org.freedesktop.login1’ to refer to a wrapper. That wrapper connects > to shepherd and waits until ‘elogind’ is started. > [...] Content analysis details: (2.1 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SPF_HELO_PASS SPF: HELO matches SPF record -0.0 SPF_PASS SPF: sender matches SPF record 1.6 PDS_OTHER_BAD_TLD Untrustworthy TLDs [URI: jpoiret.xyz (xyz)] 1.0 BULK_RE_SUSP_NTLD Precedence bulk and RE: from a suspicious TLD 0.5 FROM_SUSPICIOUS_NTLD From abused NTLD -0.0 T_SCC_BODY_TEXT_LINE No description available. -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager Hello Ludo, Ludovic Court=C3=A8s writes: > The patch below address that: it changes the =E2=80=9CExec=3D=E2=80=9D li= ne of > =E2=80=98org.freedesktop.login1=E2=80=99 to refer to a wrapper. That wra= pper connects > to shepherd and waits until =E2=80=98elogind=E2=80=99 is started. > > That way, if dbus-daemon comes first, it won=E2=80=99t actually launch an= ything > and instead wait for the Shepherd =E2=80=98elogind=E2=80=99 service to be= up. (And if > it comes second, dbus-daemon won=E2=80=99t try to launch anything, so no > spurious =E2=80=9Calready running=E2=80=9D messages.) > > I tested it in a =E2=80=98desktop.tmpl=E2=80=99 VM, quickly logging in on= tty1. On > /var/log/messages, you can see the =E2=80=9CActivating =E2=80=A6.login1= =E2=80=9D message from > dbus-daemon, followed by =E2=80=9CService elogind started=E2=80=9D from s= hepherd, > followed by =E2=80=9CSuccessfully activated =E2=80=A6.login1=E2=80=9D fro= m dbus-daemon. > > The =E2=80=9Celogind=E2=80=9D system test passes too. > > Thoughts? Objections? > > Ludo=E2=80=99. Great idea! The patch LGTM, although I'd argue that most of the wrapper code could belong in (gnu services herd), in something like (wait-for-services #:select? select? #:retries (n 20)) so that someone else doesn't end up recoding this for another service. Best, --=20 Josselin Poiret From debbugs-submit-bounces@debbugs.gnu.org Sat May 28 17:26:34 2022 Received: (at 55444-done) by debbugs.gnu.org; 28 May 2022 21:26:34 +0000 Received: from localhost ([127.0.0.1]:38983 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nv3xC-00088f-32 for submit@debbugs.gnu.org; Sat, 28 May 2022 17:26:34 -0400 Received: from eggs.gnu.org ([209.51.188.92]:51052) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nv3xA-00088Q-4k for 55444-done@debbugs.gnu.org; Sat, 28 May 2022 17:26:33 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:35592) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nv3x4-0001UU-Bn; Sat, 28 May 2022 17:26:26 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=+R4VZAOZnzM5wNUgSIwCTEPQwT2ZOOmy/jKCuxE7g8s=; b=btZfsDET07uD77bTuQQe 6EcGc7YcueJOlSQWi8sJ3lIO6HgguSlorh39NgxO25I3G9HzsHKcinw6HWcmx7XpDtWJOloQwW/uc fSaN7tIVMBopbCcW6Gacl9mknWUcox4RWbQ4NPXDK1e3tJMqF+Z/e/WLp43JeUfmt4yZDD2nqSDhH SN2lb66cbcdRXwBeTFduMN3JYaujaXE11yhSepl9HlBel2g4LEJP3EwM7fDwI2qinbWdW6gk6HGan NfaN+5titLbiNU1WnlBFoDmVdyKFMPpwCzjER9mCfVQuX9LT4PLhF/WfXeyTxWwXOMdQX/7FKbQEo iAJVNjYsWtSmTQ==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:60036 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nv3x3-0003Qy-U8; Sat, 28 May 2022 17:26:26 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Josselin Poiret Subject: Re: bug#55444: elogind startup race between shepherd and dbus-daemon References: <877d6lc28o.fsf@inria.fr> <878rqmelwm.fsf@gnu.org> <87leumgjmh.fsf@jpoiret.xyz> Date: Sat, 28 May 2022 23:26:23 +0200 In-Reply-To: <87leumgjmh.fsf@jpoiret.xyz> (Josselin Poiret's message of "Sat, 28 May 2022 10:13:26 +0200") Message-ID: <87czfx9wn4.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 55444-done Cc: 55444-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hello, Josselin Poiret skribis: > Great idea! The patch LGTM, although I'd argue that most of the wrapper > code could belong in (gnu services herd), in something like > (wait-for-services #:select? select? #:retries (n 20)) so that someone > else doesn't end up recoding this for another service. Good idea, done here: f383838a09 services: elogind: When started by dbus-daemon, wait for the S= hepherd service. b04ae71def services: herd: Add 'wait-for-service'. Let me know what you think! Thanks, Ludo=E2=80=99. From unknown Mon Jun 23 18:30:36 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 26 Jun 2022 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator