From unknown Tue Jun 17 01:39:05 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#69188 <69188@debbugs.gnu.org> To: bug#69188 <69188@debbugs.gnu.org> Subject: Status: 30.0.50; project-files + project-find-file is slow in large repositories Reply-To: bug#69188 <69188@debbugs.gnu.org> Date: Tue, 17 Jun 2025 08:39:05 +0000 retitle 69188 30.0.50; project-files + project-find-file is slow in large r= epositories reassign 69188 emacs submitter 69188 Spencer Baugh severity 69188 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 18 13:21:15 2024 Received: (at submit) by debbugs.gnu.org; 18 Feb 2024 18:21:15 +0000 Received: from localhost ([127.0.0.1]:36915 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rblms-0002dl-L2 for submit@debbugs.gnu.org; Sun, 18 Feb 2024 13:21:15 -0500 Received: from lists.gnu.org ([209.51.188.17]:53956) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rblSM-0001mW-BG for submit@debbugs.gnu.org; Sun, 18 Feb 2024 13:00:03 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1razKK-0003zA-HQ for bug-gnu-emacs@gnu.org; Fri, 16 Feb 2024 09:36:32 -0500 Received: from mxout5.mail.janestreet.com ([64.215.233.18]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1razKI-0003NB-QF for bug-gnu-emacs@gnu.org; Fri, 16 Feb 2024 09:36:32 -0500 From: Spencer Baugh To: bug-gnu-emacs@gnu.org Subject: 30.0.50; project-files + project-find-file is slow in large repositories Date: Thu, 15 Feb 2024 17:55:46 -0500 Message-ID: X-Debbugs-Cc: MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1708094189; bh=t9/KDA79OABR2WeY/27WB0uFtBQh2wam0ho3GDDdr48=; h=From:To:Cc:Subject:Date; b=PHO9jNfIy1tUY4BqgxOAZBuusI9CFnpHQhoLD1j2E/7cLROmgyOvL/bWE4xJ7YFHV VbAxBcu7CKeR7CiJ13HBA5neswjjv9hZIljFAO1oBRoWDaqy6yzdXIRjyv1Gy3asWz h4U8TsMi5jkPmvMloZz/AnwkuehdWhubP5uA2J/B89UEzO0qyR6WFrWBvoXvFAaSvW /5wSFCNjaQGI3QCdMGJsava2hBc3fI6Jcv0Y9BTyq7324KBA7frIlQxVlb0zR8wjvy MesZPCoNI+6dSj4R5SOzO7sf5YbBmWsBhTMxxjeOTRQFAAlMqPVZkmY4PNilsJLQG0 7N2mB7878WcYQ== Received-SPF: pass client-ip=64.215.233.18; envelope-from=sbaugh@janestreet.com; helo=mxout5.mail.janestreet.com X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, DATE_IN_PAST_12_24=1.049, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -3.2 (---) X-Debbugs-Envelope-To: submit Cc: Dmitry Gutov X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.2 (----) (project-files (project-current)) takes around 1 second in Linux (80k files) and 7 seconds in my larger (500k file) repository. With this patch: diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el index c7c07c3d34c..037beaa835a 100644 --- a/lisp/progmodes/project.el +++ b/lisp/progmodes/project.el @@ -667,12 +667,15 @@ (setq i (concat i "**")))) i))) extra-ignores))))) - (setq files - (mapcar - (lambda (file) (concat default-directory file)) - (split-string - (apply #'vc-git--run-command-string nil "ls-files" args) - "\0" t))) + (with-temp-buffer + (let ((ok (apply #'vc-git--out-ok "ls-files" args)) + (pt (point-min))) + (unless ok + (error "File listing failed: %s" (buffer-string))) + (goto-char pt) + (while (search-forward "\0" nil t) + (push (concat default-directory (buffer-substring-no-properties pt (1- (point)))) files) + (setq pt (point))))) (when (project--vc-merge-submodules-p default-directory) ;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'. (let* ((submodules (project--git-submodules)) project-files in Linux takes around .75 seconds. If I further remove the (concat default-directory ...) around each file, it speeds up to .5 seconds. (Note that git ls-files itself takes only around 20 milliseconds) My large repository (which uses Mercurial) has a custom project-files which is basically: (with-temp-buffer (unless (zerop (apply #'call-process "rhg" nil t nil "files")) (error "File listing failed: %s" (buffer-string))) (goto-char (point-min)) (let ((pt (point)) res) (while (search-forward "\n" nil t) (push (file-name-concat default-directory (buffer-substring-no-properties pt (1- (point)))) res) (setq pt (point))) res)) Likewise, removing the (concat default-directory ...) speeds my project-files up from 7 seconds to 4.5 seconds. This is especially silly because project-find-file then just removes this default-directory again from all the files, which has yet more overhead. My proposal: Could we find a way to make the default-directory not necessary for the files returned from project-files? Perhaps project-files could be allowed to return relative file paths which are relative to the project root. Then in the common case where all the files are within the project root, project-find-file would be way faster. Happy to implement this, if it makes sense. Another optimization I've considered: We could run the process asynchronously so project-files parsing can be parallel with the process; but the process is usually very fast anyway, that's not most of the overhead, so that won't be a big win. However, that would make it easy for project-files as a whole to be asynchronous. Then that would allow project-find-file to start the listing in the background, and then we'd write a completion table which completes only over whatever files we've already read into Emacs. I think this would be a lot nicer for most use-cases, and I'd again be happy to implement this. Also happy to implement any other optimizations you think might make sense. In GNU Emacs 30.0.50 (build 37, x86_64-pc-linux-gnu, X toolkit, cairo version 1.15.12, Xaw scroll bars) of 2024-02-13 built on igm-qws-u22796a Repository revision: a24a2b1ceb12f11c9d345190fbf554f27c4ec186 Repository branch: master Windowing system distributor 'The X.Org Foundation', version 11.0.12011000 System Description: Rocky Linux 8.9 (Green Obsidian) Configured using: 'configure -C --with-x-toolkit=lucid 'CFLAGS=-O0 -g3' --without-native-compilation --without-gif' Configured features: CAIRO DBUS FREETYPE GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON LIBSELINUX LIBSYSTEMD LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM LUCID ZLIB Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t global-eldoc-mode: t eldoc-mode: t show-paren-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t minibuffer-regexp-mode: t line-number-mode: t indent-tabs-mode: t transient-mark-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t Load-path shadows: None found. Features: (shadow sort mail-extr emacsbug message mailcap yank-media puny dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068 epg-config gnus-util text-property-search time-date subr-x mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd touch-screen tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic indonesian philippine cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget keymap hashtable-print-readable backquote threads dbusbind inotify dynamic-setting system-font-setting font-render-setting cairo x-toolkit xinput2 x multi-tty move-toolbar make-network-process emacs) Memory information: ((conses 16 65052 9318) (symbols 48 9539 0) (strings 32 22452 1449) (string-bytes 1 659675) (vectors 16 9245) (vector-slots 8 111110 9295) (floats 8 40 17) (intervals 56 262 0) (buffers 976 10)) From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 18 13:57:17 2024 Received: (at control) by debbugs.gnu.org; 18 Feb 2024 18:57:18 +0000 Received: from localhost ([127.0.0.1]:38276 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rbmLl-0007je-Hk for submit@debbugs.gnu.org; Sun, 18 Feb 2024 13:57:17 -0500 Received: from eggs.gnu.org ([209.51.188.92]:45462) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rbmLg-0007jJ-Vn; Sun, 18 Feb 2024 13:57:15 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rbmLG-00014z-ER; Sun, 18 Feb 2024 13:56:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=5GXb3EjeBGvqcIFHrQIKYajPL8gOklMVatE6vzWcp0g=; b=V1RKgXfk7p3h qdmMwrS/ZzdFtuTpqvvnhXWCYPem8dxxWZRJ573GhnwztDglweA6biy/Gy6g2R4B3X6jVHdqLDk/a VU5Cam2RRP0O0GOXa5udJ9nMmpZpTx7FHM8xQUNdoIcg5FRMosB6q2wyf7zsBXs0qTHBZlzTlOyge TrNw6aN5N0+dUyS/kZXi2QGcxrslZVabqQlZC7nadOkuSpLtthuvOjuZdsGhLvHbIn27Lh4d5+rij hqWIwPsv5Ki7cTYSblSAawfFsjmAFUhpTXQ39tpm5mcjnrsuVTEBbqtSB56T/mXjXDkiULPzNxjwq KY+B4EznI7QBnpS2tvlyYg==; Date: Sun, 18 Feb 2024 20:56:43 +0200 Message-Id: <86y1bhr47o.fsf@gnu.org> From: Eli Zaretskii To: Spencer Baugh In-Reply-To: (message from Spencer Baugh on Thu, 15 Feb 2024 17:55:46 -0500) Subject: Re: bug#69233: 30.0.50; project-files + project-find-file is slow in large repositories References: X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: control Cc: dmitry@gutov.dev, 69233@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.2 (-----) merge 69233 69188 thanks > Cc: Dmitry Gutov > From: Spencer Baugh > Date: Thu, 15 Feb 2024 17:55:46 -0500 > > > (project-files (project-current)) takes around 1 second in Linux (80k > files) and 7 seconds in my larger (500k file) repository. This is a duplicate of another bug report you submitted not long ago. From debbugs-submit-bounces@debbugs.gnu.org Fri Feb 23 16:55:28 2024 Received: (at 69188) by debbugs.gnu.org; 23 Feb 2024 21:55:29 +0000 Received: from localhost ([127.0.0.1]:46568 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rddVv-00031y-7a for submit@debbugs.gnu.org; Fri, 23 Feb 2024 16:55:28 -0500 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:46893) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rddCF-0001xn-Ix for 69188@debbugs.gnu.org; Fri, 23 Feb 2024 16:35:09 -0500 From: Spencer Baugh To: Eli Zaretskii Subject: Re: bug#69188: 30.0.50; project-files + project-find-file is slow in large repositories In-Reply-To: <86bk8dr0g1.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 18 Feb 2024 22:18:06 +0200") References: <86y1bhr47o.fsf@gnu.org> <86frxpr1yl.fsf@gnu.org> <391ea08d-9d52-4f03-a602-045b76ac862c@gutov.dev> <86bk8dr0g1.fsf@gnu.org> Date: Fri, 23 Feb 2024 16:34:38 -0500 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1708724078; bh=MNNC0Dy0Vb0yb7oV+/PLirPqARrLZAfRQwmyHwtaogc=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=OjHINCkCHQaIeIgnWKgpCiTJVaDSoQsVl6bHhBOixXovdUqIeGkKofSdlwxe3Nlvn Npoj6UC1EakVGsjZ8n1pYRdTOYWTNPWxhBxzaBYbuxcpuN8SdSC10FLoNmuitooEve nlD1YG22jFN3gclokXw+0taZHYse+RVq1Ezfcw0Mu16kisZdw3fyvsgmyNPmEBDZ7U eY12CVKq4yQOP8VHIjy6feTNnN7TUSAqM6llin1wgKvqJjgjaeOk3K/kicrY39jvPM Yd/mQx2tSQyaJK9PjNm36DRV1qmVBuC0413DMLXwq1JAtE+pPMcIAb9Rg2iOq31WDB baM0lqemYLGPg== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 69188 Cc: Dmitry Gutov , 69233@debbugs.gnu.org, 69188@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Eli Zaretskii writes: >> Date: Sun, 18 Feb 2024 22:11:43 +0200 >> Cc: sbaugh@janestreet.com, 69233@debbugs.gnu.org >> From: Dmitry Gutov >> >> On 18/02/2024 21:45, Eli Zaretskii wrote: >> >> Date: Sun, 18 Feb 2024 21:42:37 +0200 >> >> Cc:69233@debbugs.gnu.org >> >> From: Dmitry Gutov >> >> >> >> On 18/02/2024 20:56, Eli Zaretskii wrote: >> >>> This is a duplicate of another bug report you submitted not long ago. >> >> Any reason I didn't receive the first one to my inbox? >> > I don't have the foggiest, sorry. >> >> It seems Spencer didn't get the confirmation email either, or he >> wouldn't resubmit. > > One can know if debbugs received a report via the Web interface. Yes, it seems that all my email was backed up for a day or so, for whatever reason. Sorry for the noise. (Or maybe I just think this is such an important bug that I submitted it twice :) ) From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 12 22:34:42 2024 Received: (at 69188) by debbugs.gnu.org; 13 Apr 2024 02:34:42 +0000 Received: from localhost ([127.0.0.1]:59786 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rvTE0-0008JY-8H for submit@debbugs.gnu.org; Fri, 12 Apr 2024 22:34:42 -0400 Received: from fout7-smtp.messagingengine.com ([103.168.172.150]:36465) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rvTDx-0008I8-8n for 69188@debbugs.gnu.org; Fri, 12 Apr 2024 22:34:38 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailfout.nyi.internal (Postfix) with ESMTP id 2C3AB138010D; Fri, 12 Apr 2024 22:34:21 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Fri, 12 Apr 2024 22:34:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1712975661; x=1713062061; bh=qhiceDS63h18VmFspvkhG2mmjSfsSsd4lkt45DHbNPQ=; b= Jb+2GB6F4IAMqtVdVm0ypcri5xvx00BQJWYxshKVLEbrnlvKaVjGgPJAcpekbcfy A6o8uY1fqOr1qPlWjneqXlHWrKnR3UQDN6MWenaVyyuWk2Tih04baJcDk8ZxWLMT COc29GltRhrJ1cxcGyiOuVZLZ7GZLueMOPOmcjqaX7jXO7Zy9leJeoWqTkpubQ21 8reXbtFQ8pI7EToBY7pmEugacs2b/jbQScriAKFVEsz3Ri62wGumy4Er2FFjwgtc pqB29t//priJXQ5Qc096BbHn6V8veSXeFus7CcsPN5DO7HJ0A5m0uY2QBtOGdCpD i6OAQMK52z5ql/dFU6Vw8w== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1712975661; x= 1713062061; bh=qhiceDS63h18VmFspvkhG2mmjSfsSsd4lkt45DHbNPQ=; b=N fFUvqG06EPOJiXS/A9k08zhfWdZpVJeG+s3A4yZqc81GnCg8T+pLw0UISIyujjXL YWAS3G9XNcjiV2cbLF7h79SFoZpWEueCOLIKHYW0ae8TNM0UHHKbGqhZFXAqGfQG TeVcLzwxm6wFM82XKr2jBXibElTMwhxrAW43WZJo6foECPqZ5wgGmQd2XZw8ZO3F MYVmOq0WvbsdhQtWO2f6nrYQHPtetfiXo2fujIzCxYf6ZeSGZTNVkn3/p0915EbR KfVRIhqiRfN7Lmeuv4DjMAjOdDn2DBSMsDGZCQvDIl81uSZDoXeKFlr/oVqyRYI5 G12QMrvtY7DOMfxKGg9Tg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudeihedgvddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvfhfhjggtgfesthejredttddvjeenucfhrhhomhepffhmihht rhihucfiuhhtohhvuceoughmihhtrhihsehguhhtohhvrdguvghvqeenucggtffrrghtth gvrhhnpedthfeuvddtveelgeeuleevvdejveehffevveehvdeuffdtfefhvdeugefgtefg tdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegumh hithhrhiesghhuthhovhdruggvvh X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 12 Apr 2024 22:34:19 -0400 (EDT) Message-ID: <1b566e9e-eca5-4746-8e31-4155d35ce7a8@gutov.dev> Date: Sat, 13 Apr 2024 05:34:18 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#69188: 30.0.50; project-files + project-find-file is slow in large repositories Content-Language: en-US To: Spencer Baugh , 69188@debbugs.gnu.org References: From: Dmitry Gutov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 69188 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Spencer, Sorry about the wait. On 16/02/2024 00:55, Spencer Baugh wrote: > > (project-files (project-current)) takes around 1 second in Linux (80k > files) and 7 seconds in my larger (500k file) repository. > > With this patch: > diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el > index c7c07c3d34c..037beaa835a 100644 > --- a/lisp/progmodes/project.el > +++ b/lisp/progmodes/project.el > @@ -667,12 +667,15 @@ > (setq i (concat i "**")))) > i))) > extra-ignores))))) > - (setq files > - (mapcar > - (lambda (file) (concat default-directory file)) > - (split-string > - (apply #'vc-git--run-command-string nil "ls-files" args) > - "\0" t))) > + (with-temp-buffer > + (let ((ok (apply #'vc-git--out-ok "ls-files" args)) > + (pt (point-min))) > + (unless ok > + (error "File listing failed: %s" (buffer-string))) > + (goto-char pt) > + (while (search-forward "\0" nil t) > + (push (concat default-directory (buffer-substring-no-properties pt (1- (point)))) files) > + (setq pt (point))))) > (when (project--vc-merge-submodules-p default-directory) > ;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'. > (let* ((submodules (project--git-submodules)) > > project-files in Linux takes around .75 seconds. The patch makes sense (and the approach works okay in project--files-in-directory), so this is something I've made a few attempts to use in the past. However, the measurements on my machine show a much smaller improvement -- just 3-4%. I.e. if I just evaluate the functions interpreted or run them just once, the variations between the runs far exceed the difference in runtimes (around ~450ms with a Linux repository checkout from 2021, 70k files). A stricter comparison works out like this: 1. Apply the patch (or not), 2. M-x byte-compile-file 3. (load "project.elc") 4. (benchmark-run 10 (project-files (project-current))) When run these in my working session one after another, the 10 iteration benchmark works out to 4.09s vs 3.93s (master vs your patch). (4.093848777 44 1.6119981489999944) vs (3.9392906549999998 41 1.499010061) With 'emacs -Q', however, it's vice versa: (3.777694389 130 1.2422826310000001) vs (3.889905663 165 1.46846598) It seems like, maybe, the longer running session is more sensitive to the allocation of the initial long string than the fresh session. In any case, I don't mind switching to the other approach. Just wondering where the difference between our machines might come from. Last but not least, when/if we apply this, we should keep the fix for bug#66806 in there. Good news is it doesn't seem to affect performance. > If I further remove the (concat default-directory ...) around each file, > it speeds up to .5 seconds. > > (Note that git ls-files itself takes only around 20 milliseconds) > > My large repository (which uses Mercurial) has a custom project-files > which is basically: > > (with-temp-buffer > (unless (zerop (apply #'call-process "rhg" nil t nil "files")) > (error "File listing failed: %s" (buffer-string))) > (goto-char (point-min)) > (let ((pt (point)) > res) > (while (search-forward "\n" nil t) > (push (file-name-concat default-directory (buffer-substring-no-properties pt (1- (point)))) res) > (setq pt (point))) > res)) > > Likewise, removing the (concat default-directory ...) speeds my > project-files up from 7 seconds to 4.5 seconds. > > This is especially silly because project-find-file then just removes > this default-directory again from all the files, which has yet more > overhead. This is, indeed, something that should show a universal improvement. Around 20% here with the Linux repository test. > My proposal: Could we find a way to make the default-directory not > necessary for the files returned from project-files? > > Perhaps project-files could be allowed to return relative file paths > which are relative to the project root. Then in the common case where > all the files are within the project root, project-find-file would be > way faster. Happy to implement this, if it makes sense. Yep, that should make sense. Originally the idea was to keep it more universal so that lists of files coming from the "external roots" could be handled the same way (used in the two *-or-external-* commands). But indeed it's the relatively rare case, so it'd be better to avoid paying the performance penalty, especially when the subsequent handling could do without the added prefix. And even the "rare case" could be split into separate calls instead of having all files returned at once. My main concern is backward compatibility, so that 3rd party callers don't break after the update. I think there are basically two approaches: - A new devar like project-use-relative-names, - Or a new argument for 'project-files', e.g. called RELATIVE. Both options are relatively clunky, and the second one might also fail to work when DIRS is non-nil (or would have to fall back to absolute names anyway), so I'm leaning toward the first one. It might also allow certain code to be written supporting both relative and absolute names (e.g. a process call both binds default-directory to root and keeps the file names as-is -- the relative ones would be interpreted as such, the rest just as they are interpreted now). Both project-find-file and project-find-regexp should be able to benefit. Although the former might require a bigger update, given that the current project-read-file-name-function options don't expect relative names. Ideally we'd have a smoother migration for custom p-r-f-n-f functions, but I don't have any good ideas there. > Another optimization I've considered: We could run the process > asynchronously so project-files parsing can be parallel with the > process; but the process is usually very fast anyway, that's not most of > the overhead, so that won't be a big win. Right. This came up in bug#64735, and together with patch in bug#66020 the asynchronous file listing can run a bit faster than the synchronous implementation. I'm guessing the difference won't be huge in your case, since either way most time remains spent in Lisp code and GC. But if we take advantage of this by improving the UIs at the same time, this can be a real win. This should go into a separate discussion, I think, but to quickly sum up my thinking on the subject: - Ideally project-files implementations for sync and async UIs should always look the same. Hopefully the "async" implementation looks the same or almost the same as the "sync" one. Threads might help. - project-find-regexp could benefit from this a lot, first by running the search in parallel to the file listing, and second by showing the results right away (the current advantage of 'M-x grep'). The difficult part is have the "async" Xref interface as well (can we do this without extending the current one? probably not). The UI also needs to have some "running ..." indicator, as well as a way to abort the search, killing both processes - that adds requirements to "async Xref" as well. > However, that would make it easy for project-files as a whole to be > asynchronous. Then that would allow project-find-file to start the > listing in the background, and then we'd write a completion table which > completes only over whatever files we've already read into Emacs. I > think this would be a lot nicer for most use-cases, and I'd again be > happy to implement this. Could this be that simple? Whatever the source of the file listing, as soon as the UI (or completion styles) calls try-completion or all-completions, the search has to finish first, shouldn't it? That seems like the semantics of this API. Or if perhaps we allow it to operate on incomplete results, how would we indicate to the user at the end that the scan has finished, and they can press TAB once more to refresh the results? Or perhaps to be able to find a file they hadn't managed to find in the incomplete set. This seems like it might require both a new UI and an extension of completion table API. E.g. in certain cases we could say that we only need N matches, so if the current incomplete set can provide as many, we don't have to wait until the end. But 'try-completion' would become unreliable either way. Even if keeping to the most conservative approach, though, it should be possible to at least render the prompt before the file listing is finished. That could make the UI look a bit more responsive. From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 16 19:49:12 2024 Received: (at 69188) by debbugs.gnu.org; 16 Apr 2024 23:49:12 +0000 Received: from localhost ([127.0.0.1]:45393 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rwsY3-0002Sx-2T for submit@debbugs.gnu.org; Tue, 16 Apr 2024 19:49:12 -0400 Received: from wfout1-smtp.messagingengine.com ([64.147.123.144]:48631) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rwsXz-0002R5-Tw for 69188@debbugs.gnu.org; Tue, 16 Apr 2024 19:49:09 -0400 Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailfout.west.internal (Postfix) with ESMTP id 0135F1C00157; Tue, 16 Apr 2024 19:48:47 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Tue, 16 Apr 2024 19:48:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm1; t=1713311327; x=1713397727; bh=tgY+jysIeU iHY2U8E5FKK/5vnuIyVpSVeRsavO751VY=; b=AnGwfgGSAcrPYUCezZixpo/w+4 to0GD854wdECk3j4F7hnrxwOckU4wbsDyId5+teMmHS2/eA2Gj8w5uGh9/Js9W7p BhlrtoS7wJztKX05bYA8pv/ESDU/Oz1VhETyTWj/Vvq23ysDfagQWZ5dXq6Rt9ys EB4r5M0//YzEmfcyzIajBfV0b4OUWSZpjC0FUpaYLU+RfGSY9qROV0oJ7CSESbzT 0hYR9R29DSz4s3r2meQf8fDNqS/AwPMzAKhKbYysuXfSiAahhEpEr3NbynogvxYp SN3P7bpYRmETGlm9PZER9jkF6USjXk2YLcXp4gHTpwLmS+2oEOcZJoi1V3yw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1713311327; x=1713397727; bh=tgY+jysIeUiHY2U8E5FKK/5vnuIy VpSVeRsavO751VY=; b=JIJVqp9ROYDMDoDqc0Zvm1ZSkZex9ceTQiSd0WlogpTI JFH9h+6Lh2ispP6bbO10vFWn0jSyh6u9OBQ01+qBV/D4TwgimmEGPx0394Nz1ewc 7txs9SFox3sPRJ4qAxD3tNsLURsAPfb0wm+F0+06FYEbEuci7WkTkD8uTQYKug+i NTwAjCKAajMRxFukUdcmJg5uYRDn2mrrTYeoiu52XM4Um9ac8eD6PNmbrZmc57Mp jeqgVSThZzx3mplBfnMzl8JJPLiN72J8Nsd2cKiVLFnPvl3ktEpxGyaBAOiWZskw gCFBaDRqFISsLEZU8od45i7BkAHsEKtb/ujTT5vXPQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrudejjedgvdefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurheptgfkffggfgfuhffvfhgjsehmtderredtvdejnecuhfhrohhmpeffmhhithhr hicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrthhtvg hrnheptdehfeekledugfettdefudeuueettddvvddvgfdvueeigeegudevleefvdevudeg necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepughmih htrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 16 Apr 2024 19:48:46 -0400 (EDT) Content-Type: multipart/mixed; boundary="------------Ee3iJUYrcK7W1C7AZeYwJ6pY" Message-ID: <4e8e8f14-26be-4a50-b47b-a0373ce19b9a@gutov.dev> Date: Wed, 17 Apr 2024 02:48:44 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#69188: 30.0.50; project-files + project-find-file is slow in large repositories From: Dmitry Gutov To: Spencer Baugh , 69188@debbugs.gnu.org References: <1b566e9e-eca5-4746-8e31-4155d35ce7a8@gutov.dev> Content-Language: en-US In-Reply-To: <1b566e9e-eca5-4746-8e31-4155d35ce7a8@gutov.dev> X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 69188 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) This is a multi-part message in MIME format. --------------Ee3iJUYrcK7W1C7AZeYwJ6pY Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 13/04/2024 05:34, Dmitry Gutov wrote: > Both options are relatively clunky, and the second one might also fail > to work when DIRS is non-nil (or would have to fall back to absolute > names anyway), so I'm leaning toward the first one. It might also allow > certain code to be written supporting both relative and absolute names > (e.g. a process call both binds default-directory to root and keeps the > file names as-is -- the relative ones would be interpreted as such, the > rest just as they are interpreted now). Here's how that change can look. The patch should demonstrate both the performance improvements for project-find-file and project-find-regexp, and some awkwardness in the implementation, chiefly due to backward compatibility. Guess more tests will be required, at the very least. --------------Ee3iJUYrcK7W1C7AZeYwJ6pY Content-Type: text/x-patch; charset=UTF-8; name="project-files-relative-names.diff" Content-Disposition: attachment; filename="project-files-relative-names.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL2xpc3AvcHJvZ21vZGVzL3Byb2plY3QuZWwgYi9saXNwL3Byb2dtb2Rl cy9wcm9qZWN0LmVsCmluZGV4IDAwMGEwNTgwNGE4Li41NjdhMjVlMDkwNiAxMDA2NDQKLS0t IGEvbGlzcC9wcm9nbW9kZXMvcHJvamVjdC5lbAorKysgYi9saXNwL3Byb2dtb2Rlcy9wcm9q ZWN0LmVsCkBAIC0zMjMsNiArMzIzLDEyIEBAIHByb2plY3QtLWZpbGUtY29tcGxldGlvbi10 YWJsZQogKGNsLWRlZm1ldGhvZCBwcm9qZWN0LXJvb3QgKChwcm9qZWN0IChoZWFkIHRyYW5z aWVudCkpKQogICAoY2RyIHByb2plY3QpKQogCisoZGVmdmFyIHByb2plY3QtZmlsZXMtcmVs YXRpdmUtbmFtZXMgbmlsCisgICJXaGVuIG5vbi1uaWwsIGBwcm9qZWN0LWZpbGVzJyBpcyBh bGxvd2VkIHRvIHJldHVybiByZWxhdGl2ZSBuYW1lcy4KK1RoZSBuYW1lcyB3aWxsIGJlIHJl bGF0aXZlIHRvIHRoZSBwcm9qZWN0IHJvb3QuICBBbmQgdGhpcyBjYW4gb25seQoraGFwcGVu IHdoZW4gYWxsIHJldHVybmVkIGZpbGVzIGFyZSBpbiB0aGUgc2FtZSBkaXJlY3RvcnkuIE1l YW5pbmcsIHRoZQorRElSUyBhcmd1bWVudCBoYXMgdG8gYmUgbmlsIG9yIGhhdmUgb25seSBv bmUgZWxlbWVudC4iKQorCiAoY2wtZGVmZ2VuZXJpYyBwcm9qZWN0LWZpbGVzIChwcm9qZWN0 ICZvcHRpb25hbCBkaXJzKQogICAiUmV0dXJuIGEgbGlzdCBvZiBmaWxlcyBpbiBkaXJlY3Rv cmllcyBESVJTIGluIFBST0pFQ1QuCiBESVJTIGlzIGEgbGlzdCBvZiBhYnNvbHV0ZSBkaXJl Y3RvcmllczsgaXQgc2hvdWxkIGJlIHNvbWUKQEAgLTM4MCw4ICszODYsMTAgQEAgcHJvamVj dC0tZmlsZXMtaW4tZGlyZWN0b3J5CiAgICAgICAgICAgICAgICAgcmVzKQogICAgICAgICAg IChzZXRxIHB0IChwb2ludCkpKSkpCiAgICAgKHByb2plY3QtLXJlbW90ZS1maWxlLW5hbWVz Ci0gICAgIChtYXBjYXIgKGxhbWJkYSAocykgKGNvbmNhdCBkZm4gcykpCi0gICAgICAgICAg ICAgKHNvcnQgcmVzICMnc3RyaW5nPCkpKSkpCisgICAgIChpZiBwcm9qZWN0LWZpbGVzLXJl bGF0aXZlLW5hbWVzCisgICAgICAgICAoc29ydCByZXMgIydzdHJpbmc8KQorICAgICAgICht YXBjYXIgKGxhbWJkYSAocykgKGNvbmNhdCBkZm4gcykpCisgICAgICAgICAgICAgICAoc29y dCByZXMgIydzdHJpbmc8KSkpKSkpCiAKIChkZWZ1biBwcm9qZWN0LS1yZW1vdGUtZmlsZS1u YW1lcyAobG9jYWwtZmlsZXMpCiAgICJSZXR1cm4gTE9DQUwtRklMRVMgYXMgaWYgdGhleSB3 ZXJlIG9uIHRoZSBzeXN0ZW0gb2YgYGRlZmF1bHQtZGlyZWN0b3J5Jy4KQEAgLTY4OSw3ICs2 OTcsOSBAQCBwcm9qZWN0LS12Yy1saXN0LWZpbGVzCiAgICAgICAgICAgICAgICAgICAgKG1h cGNhcgogICAgICAgICAgICAgICAgICAgICAobGFtYmRhIChmaWxlKQogICAgICAgICAgICAg ICAgICAgICAgICh1bmxlc3MgKG1lbWJlciBmaWxlIHN1Ym1vZHVsZXMpCi0gICAgICAgICAg ICAgICAgICAgICAgICAoY29uY2F0IGRlZmF1bHQtZGlyZWN0b3J5IGZpbGUpKSkKKyAgICAg ICAgICAgICAgICAgICAgICAgIChpZiBwcm9qZWN0LWZpbGVzLXJlbGF0aXZlLW5hbWVzCisg ICAgICAgICAgICAgICAgICAgICAgICAgICAgZmlsZQorICAgICAgICAgICAgICAgICAgICAg ICAgICAoY29uY2F0IGRlZmF1bHQtZGlyZWN0b3J5IGZpbGUpKSkpCiAgICAgICAgICAgICAg ICAgICAgIChzcGxpdC1zdHJpbmcKICAgICAgICAgICAgICAgICAgICAgIChhcHBseSAjJ3Zj LWdpdC0tcnVuLWNvbW1hbmQtc3RyaW5nIG5pbCAibHMtZmlsZXMiIGFyZ3MpCiAgICAgICAg ICAgICAgICAgICAgICAiXDAiIHQpKSkpCkBAIC03MTYsNyArNzI2LDggQEAgcHJvamVjdC0t dmMtbGlzdC1maWxlcwogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBkaXIpKQog ICAgICAgICAgICAgKGFyZ3MgKGxpc3QgKGNvbmNhdCAiLW1jYXJkIiAoYW5kIGluY2x1ZGUt dW50cmFja2VkICJ1IikpCiAgICAgICAgICAgICAgICAgICAgICAgICAiLS1uby1zdGF0dXMi Ci0gICAgICAgICAgICAgICAgICAgICAgICAiLTAiKSkpCisgICAgICAgICAgICAgICAgICAg ICAgICAiLTAiKSkKKyAgICAgICAgICAgIGZpbGVzKQogICAgICAgICh3aGVuIGV4dHJhLWln bm9yZXMKICAgICAgICAgIChzZXRxIGFyZ3MgKG5jb25jIGFyZ3MKICAgICAgICAgICAgICAg ICAgICAgICAgICAgIChtYXBjYW4KQEAgLTcyNSw5ICs3MzYsMTIgQEAgcHJvamVjdC0tdmMt bGlzdC1maWxlcwogICAgICAgICAgICAgICAgICAgICAgICAgICAgIGV4dHJhLWlnbm9yZXMp KSkpCiAgICAgICAgKHdpdGgtdGVtcC1idWZmZXIKICAgICAgICAgIChhcHBseSAjJ3ZjLWhn LWNvbW1hbmQgdCAwICIuIiAic3RhdHVzIiBhcmdzKQotICAgICAgICAgKG1hcGNhcgotICAg ICAgICAgIChsYW1iZGEgKHMpIChjb25jYXQgZGVmYXVsdC1kaXJlY3RvcnkgcykpCi0gICAg ICAgICAgKHNwbGl0LXN0cmluZyAoYnVmZmVyLXN0cmluZykgIlwwIiB0KSkpKSkpKQorICAg ICAgICAgKHNldHEgZmlsZXMgKHNwbGl0LXN0cmluZyAoYnVmZmVyLXN0cmluZykgIlwwIiB0 KSkKKyAgICAgICAgICh1bmxlc3MgcHJvamVjdC1maWxlcy1yZWxhdGl2ZS1uYW1lcworICAg ICAgICAgICAoc2V0cSBmaWxlcyAobWFwY2FyCisgICAgICAgICAgICAgICAgICAgICAgICAo bGFtYmRhIChzKSAoY29uY2F0IGRlZmF1bHQtZGlyZWN0b3J5IHMpKQorICAgICAgICAgICAg ICAgICAgICAgICAgZmlsZXMpKSkKKyAgICAgICAgIGZpbGVzKSkpKSkKIAogKGRlZnVuIHBy b2plY3QtLXZjLW1lcmdlLXN1Ym1vZHVsZXMtcCAoZGlyKQogICAocHJvamVjdC0tdmFsdWUt aW4tZGlyCkBAIC05NzAsNiArOTg0LDcgQEAgcHJvamVjdC1maW5kLXJlZ2V4cAogICAobGV0 KiAoKGNhbGxlci1kaXIgZGVmYXVsdC1kaXJlY3RvcnkpCiAgICAgICAgICAocHIgKHByb2pl Y3QtY3VycmVudCB0KSkKICAgICAgICAgIChkZWZhdWx0LWRpcmVjdG9yeSAocHJvamVjdC1y b290IHByKSkKKyAgICAgICAgIChwcm9qZWN0LWZpbGVzLXJlbGF0aXZlLW5hbWVzIHQpCiAg ICAgICAgICAoZmlsZXMKICAgICAgICAgICAoaWYgKG5vdCBjdXJyZW50LXByZWZpeC1hcmcp CiAgICAgICAgICAgICAgIChwcm9qZWN0LWZpbGVzIHByKQpAQCAtMTAwMCw2ICsxMDE1LDgg QEAgcHJvamVjdC1vci1leHRlcm5hbC1maW5kLXJlZ2V4cAogICAocmVxdWlyZSAneHJlZikK ICAgKGxldCogKChwciAocHJvamVjdC1jdXJyZW50IHQpKQogICAgICAgICAgKGRlZmF1bHQt ZGlyZWN0b3J5IChwcm9qZWN0LXJvb3QgcHIpKQorICAgICAgICAgOzsgVE9ETzogTWFrZSB1 c2Ugb2YgYHByb2plY3QtZmlsZXMtcmVsYXRpdmUtbmFtZXMnIGJ5CisgICAgICAgICA7OyBz ZWFyY2hpbmcgZWFjaCByb290IHNlcGFyYXRlbHkgKG1heWJlIGluIHBhcmFsbGVsLCB0b28p LgogICAgICAgICAgKGZpbGVzCiAgICAgICAgICAgKHByb2plY3QtZmlsZXMgcHIgKGNvbnMK ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgKHByb2plY3Qtcm9vdCBwcikKQEAgLTEw NTQsNyArMTA3MSw4IEBAIHByb2plY3QtZmluZC1maWxlCiAgIChpbnRlcmFjdGl2ZSAiUCIp CiAgIChsZXQqICgocHIgKHByb2plY3QtY3VycmVudCB0KSkKICAgICAgICAgIChyb290IChw cm9qZWN0LXJvb3QgcHIpKQotICAgICAgICAgKGRpcnMgKGxpc3Qgcm9vdCkpKQorICAgICAg ICAgKGRpcnMgKGxpc3Qgcm9vdCkpCisgICAgICAgICAocHJvamVjdC1maWxlcy1yZWxhdGl2 ZS1uYW1lcyB0KSkKICAgICAocHJvamVjdC1maW5kLWZpbGUtaW4KICAgICAgKG9yICh0aGlu Zy1hdC1wb2ludCAnZmlsZW5hbWUpCiAgICAgICAgICAoYW5kIGJ1ZmZlci1maWxlLW5hbWUg KHByb2plY3QtLWZpbmQtZGVmYXVsdC1mcm9tIGJ1ZmZlci1maWxlLW5hbWUgcHIpKSkKQEAg LTExMzAsNyArMTE0OCwxMiBAQCBwcm9qZWN0LS1yZWFkLWZpbGUtY3BkLXJlbGF0aXZlCiAg ICAgICAgICAgICAoaWYgKD4gKGxlbmd0aCBjb21tb24tcHJlZml4KSAwKQogICAgICAgICAg ICAgICAgIChmaWxlLW5hbWUtZGlyZWN0b3J5IGNvbW1vbi1wcmVmaXgpKSkpCiAgICAgICAg ICAoY3BkLWxlbmd0aCAobGVuZ3RoIGNvbW1vbi1wYXJlbnQtZGlyZWN0b3J5KSkKLSAgICAg ICAgIChwcm9tcHQgKGlmICh6ZXJvcCBjcGQtbGVuZ3RoKQorICAgICAgICAgKGNvbW1vbi1w YXJlbnQtZGlyZWN0b3J5IChpZiAoZmlsZS1uYW1lLWFic29sdXRlLXAgKGNhciBhbGwtZmls ZXMpKQorICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjb21tb24tcGFy ZW50LWRpcmVjdG9yeQorICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgKGNv bmNhdCBkZWZhdWx0LWRpcmVjdG9yeSBjb21tb24tcGFyZW50LWRpcmVjdG9yeSkpKQorICAg ICAgICAgKHByb21wdCAoaWYgKGFuZCAoemVyb3AgY3BkLWxlbmd0aCkKKyAgICAgICAgICAg ICAgICAgICAgICAgICAgYWxsLWZpbGVzCisgICAgICAgICAgICAgICAgICAgICAgICAgIChm aWxlLW5hbWUtYWJzb2x1dGUtcCAoY2FyIGFsbC1maWxlcykpKQogICAgICAgICAgICAgICAg ICAgICAgcHJvbXB0CiAgICAgICAgICAgICAgICAgICAgKGNvbmNhdCBwcm9tcHQgKGZvcm1h dCAiIGluICVzIiBjb21tb24tcGFyZW50LWRpcmVjdG9yeSkpKSkKICAgICAgICAgIChpbmNs dWRlZC1jcGQgKHdoZW4gKG1lbWJlciBjb21tb24tcGFyZW50LWRpcmVjdG9yeSBhbGwtZmls ZXMpCkBAIC0xMTY4LDYgKzExOTEsNyBAQCBwcm9qZWN0LS1yZWFkLWZpbGUtYWJzb2x1dGUK ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGFsbC1maWxlcyAmb3B0aW9u YWwgcHJlZGljYXRlCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBoaXN0 IG1iLWRlZmF1bHQpCiAgIChwcm9qZWN0LS1jb21wbGV0aW5nLXJlYWQtc3RyaWN0IHByb21w dAorICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICA7OyBGSVhNRTogTWFwIHJl bGF0aXZlIG5hbWVzIHRvIGFic29sdXRlPwogICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAocHJvamVjdC0tZmlsZS1jb21wbGV0aW9uLXRhYmxlIGFsbC1maWxlcykKICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgcHJlZGljYXRlCiAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgIGhpc3QgbWItZGVmYXVsdCkpCkBAIC0xMjE1LDYg KzEyMzksNyBAQCBwcm9qZWN0LWZpbmQtZmlsZS1pbgogICAgICAgICAgICAgICAgZGlycykK ICAgICAgICAgICAgIChwcm9qZWN0LWZpbGVzIHByb2plY3QgZGlycykpKQogICAgICAgICAg KGNvbXBsZXRpb24taWdub3JlLWNhc2UgcmVhZC1maWxlLW5hbWUtY29tcGxldGlvbi1pZ25v cmUtY2FzZSkKKyAgICAgICAgIChkZWZhdWx0LWRpcmVjdG9yeSAocHJvamVjdC1yb290IHBy b2plY3QpKQogICAgICAgICAgKGZpbGUgKHByb2plY3QtLXJlYWQtZmlsZS1uYW1lCiAgICAg ICAgICAgICAgICAgcHJvamVjdCAiRmluZCBmaWxlIgogICAgICAgICAgICAgICAgIGFsbC1m aWxlcyBuaWwgJ2ZpbGUtbmFtZS1oaXN0b3J5CmRpZmYgLS1naXQgYS9saXNwL3Byb2dtb2Rl cy94cmVmLmVsIGIvbGlzcC9wcm9nbW9kZXMveHJlZi5lbAppbmRleCA3NTVjM2RiMDRmZC4u MjlmYzZjZDU2MGYgMTAwNjQ0Ci0tLSBhL2xpc3AvcHJvZ21vZGVzL3hyZWYuZWwKKysrIGIv bGlzcC9wcm9nbW9kZXMveHJlZi5lbApAQCAtMTkyMiw3ICsxOTIyLDggQEAgeHJlZi1tYXRj aGVzLWluLWZpbGVzCiAgICAgICAgKGhpdHMgbmlsKQogICAgICAgIDs7IFN1cHBvcnQgZm9y IHJlbW90ZSBmaWxlcy4gIFRoZSBhc3N1bXB0aW9uIGlzIHRoYXQsIGlmIHRoZQogICAgICAg IDs7IGZpcnN0IGZpbGUgaXMgcmVtb3RlLCB0aGV5IGFsbCBhcmUsIGFuZCBvbiB0aGUgc2Ft ZSBob3N0LgotICAgICAgIChkaXIgKGZpbGUtbmFtZS1kaXJlY3RvcnkgKGNhciBmaWxlcykp KQorICAgICAgIChkaXIgKG9yIChmaWxlLW5hbWUtZGlyZWN0b3J5IChjYXIgZmlsZXMpKQor ICAgICAgICAgICAgICAgIGRlZmF1bHQtZGlyZWN0b3J5KSkKICAgICAgICAocmVtb3RlLWlk IChmaWxlLXJlbW90ZS1wIGRpcikpCiAgICAgICAgOzsgVGhlICdhdXRvJyBkZWZhdWx0IHdv dWxkIGJlIGZpbmUgdG9vLCBidXQgcmlwZ3JlcCBjYW4ndCBoYW5kbGUKICAgICAgICA7OyB0 aGUgb3B0aW9ucyB3ZSBwYXNzIGluIHRoYXQgY2FzZS4K --------------Ee3iJUYrcK7W1C7AZeYwJ6pY-- From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 29 16:27:30 2024 Received: (at 69188) by debbugs.gnu.org; 29 Apr 2024 20:27:30 +0000 Received: from localhost ([127.0.0.1]:58734 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1Xb0-0002l5-4K for submit@debbugs.gnu.org; Mon, 29 Apr 2024 16:27:30 -0400 Received: from mxout6.mail.janestreet.com ([64.215.233.21]:34337) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1Xax-0002kt-VF for 69188@debbugs.gnu.org; Mon, 29 Apr 2024 16:27:28 -0400 From: Spencer Baugh To: Dmitry Gutov Subject: Re: bug#69233: 30.0.50; project-files + project-find-file is slow in large repositories In-Reply-To: <4e8e8f14-26be-4a50-b47b-a0373ce19b9a@gutov.dev> (Dmitry Gutov's message of "Wed, 17 Apr 2024 02:48:44 +0300") References: <1b566e9e-eca5-4746-8e31-4155d35ce7a8@gutov.dev> <4e8e8f14-26be-4a50-b47b-a0373ce19b9a@gutov.dev> Date: Mon, 29 Apr 2024 16:27:01 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1714422421; bh=AkAd6h6ePt4FCHh9p3mPHvTbZ0pzDPNSHnS8H2+kk7I=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=qz1ZJQDRrzGLz7iUbVDfMVYGSSENfWgHkflCLxh+es9P5OuDAq/PvGPbsZKM0D2gX z7kuJQKPMHbuunz7fKYAnUEKF+vTLLWC92yrRb8Eb9geYUXOMPmXtdntxVwvMl/OjR 5jm1J+Eqx1i4uszFSPlK6OKgXULv1nORGQw2OgkW36n03TgOUWtxZR+xPIAuKcJNeb A6dOo8/qOV8nejaNQezin05kefXIzroCffhEnfdJ3T+bmO8qsoAnawZRrWbH2bd1OD rofGQ+wkX5EAjWEfn7d63Vgjmz5KD71TcCilvjwCizrR49HbL2g7ZJDUVnVGCddN5h hTeyIzqbrcf0g== X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 69188 Cc: 69233@debbugs.gnu.org, 69188@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.3 (/) Dmitry Gutov writes: > On 13/04/2024 05:34, Dmitry Gutov wrote: >> Both options are relatively clunky, and the second one might also >> fail to work when DIRS is non-nil (or would have to fall back to >> absolute names anyway), so I'm leaning toward the first one. It >> might also allow certain code to be written supporting both relative >> and absolute names (e.g. a process call both binds default-directory >> to root and keeps the file names as-is -- the relative ones would be >> interpreted as such, the rest just as they are interpreted now). > > Here's how that change can look. > > The patch should demonstrate both the performance improvements for > project-find-file and project-find-regexp, and some awkwardness in the > implementation, chiefly due to backward compatibility. > > Guess more tests will be required, at the very least. I see almost a 50% performance improvement with this patch in my large private repository, once adding support for project-files-relative-names in my internal project backend. Seems great so far. My benchmarking: (let ((proj (project-current))) (list (benchmark-run 10 (let ((project-files-relative-names t)) (length (project-files proj)))) (benchmark-run 10 (let ((project-files-relative-names nil)) (length (project-files proj)))))) ((17.605295389 28 7.647366087000023) (29.918302167 57 19.246283027999993)) From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 29 17:05:01 2024 Received: (at 69188) by debbugs.gnu.org; 29 Apr 2024 21:05:02 +0000 Received: from localhost ([127.0.0.1]:58804 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1YBI-0003Ef-W8 for submit@debbugs.gnu.org; Mon, 29 Apr 2024 17:05:01 -0400 Received: from mxout5.mail.janestreet.com ([64.215.233.18]:58523) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1YB9-0003EL-N6 for 69188@debbugs.gnu.org; Mon, 29 Apr 2024 17:04:52 -0400 From: Spencer Baugh To: Dmitry Gutov Subject: Re: bug#69233: 30.0.50; project-files + project-find-file is slow in large repositories In-Reply-To: <1b566e9e-eca5-4746-8e31-4155d35ce7a8@gutov.dev> (Dmitry Gutov's message of "Sat, 13 Apr 2024 05:34:18 +0300") References: <1b566e9e-eca5-4746-8e31-4155d35ce7a8@gutov.dev> Date: Mon, 29 Apr 2024 17:04:24 -0400 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=janestreet.com; s=waixah; t=1714424664; bh=tGMuXHjWDMUVZnBREMf1JpXGB3rrB1vP2l9A8HxuyVk=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=US5I6dBNZwwmikp9TSCNjZllYxBFckKGpbj8KxSJEo6ZjN9qzCraT3ZhuX66ZxOo1 DJ37ojXEcFMVjNaUJ6Ip90OA4VS1E9TGX3i0EoC1EFKsloqUrf1Lr0yzsSGEg1tX1y +ZpvXdkxXIaJK5XQs4Lwx+mrqz6wAh+wokVkc6ZQ5KJI5C6x10CLpLePziqOBzozWN YEUjh2Yqand71O8KaC3tWeTtJhC0oG6rAWmeoqyvNxM75GGGHG7QMc21IBCWsmewto QTM/OKKieK7uWPaG/htgHXVavI3fl2qHMMdT7IH+UdSYXOiFwspW3W8GzCMWw4zENL gcRQBV0d7cz6A== X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 69188 Cc: 69233@debbugs.gnu.org, 69188@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.3 (/) Dmitry Gutov writes: > Hi Spencer, > > Sorry about the wait. > > On 16/02/2024 00:55, Spencer Baugh wrote: >> (project-files (project-current)) takes around 1 second in Linux >> (80k >> files) and 7 seconds in my larger (500k file) repository. >> With this patch: >> diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el >> index c7c07c3d34c..037beaa835a 100644 >> --- a/lisp/progmodes/project.el >> +++ b/lisp/progmodes/project.el >> @@ -667,12 +667,15 @@ >> (setq i (concat i "**")))) >> i))) >> extra-ignores))))) >> - (setq files >> - (mapcar >> - (lambda (file) (concat default-directory file)) >> - (split-string >> - (apply #'vc-git--run-command-string nil "ls-files" args) >> - "\0" t))) >> + (with-temp-buffer >> + (let ((ok (apply #'vc-git--out-ok "ls-files" args)) >> + (pt (point-min))) >> + (unless ok >> + (error "File listing failed: %s" (buffer-string))) >> + (goto-char pt) >> + (while (search-forward "\0" nil t) >> + (push (concat default-directory (buffer-substring-no-properties pt (1- (point)))) files) >> + (setq pt (point))))) >> (when (project--vc-merge-submodules-p default-directory) >> ;; Unfortunately, 'ls-files --recurse-submodules' conflicts with '-o'. >> (let* ((submodules (project--git-submodules)) >> project-files in Linux takes around .75 seconds. > > The patch makes sense (and the approach works okay in > project--files-in-directory), so this is something I've made a few > attempts to use in the past. > > However, the measurements on my machine show a much smaller > improvement -- just 3-4%. I.e. if I just evaluate the functions > interpreted or run them just once, the variations between the runs far > exceed the difference in runtimes (around ~450ms with a Linux > repository checkout from 2021, 70k files). > > A stricter comparison works out like this: > > 1. Apply the patch (or not), > 2. M-x byte-compile-file > 3. (load "project.elc") > 4. (benchmark-run 10 (project-files (project-current))) > > When run these in my working session one after another, the 10 > iteration benchmark works out to 4.09s vs 3.93s (master vs your > patch). > > (4.093848777 44 1.6119981489999944) > > vs > > (3.9392906549999998 41 1.499010061) > > With 'emacs -Q', however, it's vice versa: > > (3.777694389 130 1.2422826310000001) > > vs > > (3.889905663 165 1.46846598) > > It seems like, maybe, the longer running session is more sensitive to > the allocation of the initial long string than the fresh session. > > In any case, I don't mind switching to the other approach. Just > wondering where the difference between our machines might come from. > > Last but not least, when/if we apply this, we should keep the fix for > bug#66806 in there. Good news is it doesn't seem to affect > performance. Oh, interesting, I see roughly the same result. Benchmarking with: (benchmark-run 10 (project-files (project-current))) Running in my long-lived existing Emacs 29 session: Old: (4.434228319 14 2.850654906999921) New: (4.983809167 16 3.2989908669999295) In Emacs 29 emacs -Q: Old: (3.5112438729999997 130 1.9230644630000002) New: (3.819248509 171 2.309731412) But, in Emacs 30 emacs -Q: Old: (7.949549188 65 3.3445626799999992) New: (7.270785783999999 87 4.0610532379999995) So... the performance improvement seems highly unreliable. Probably not worth changing this area, then - the other patch to allow relative files will probably be more worth it. >> My proposal: Could we find a way to make the default-directory not >> necessary for the files returned from project-files? >> Perhaps project-files could be allowed to return relative file paths >> which are relative to the project root. Then in the common case where >> all the files are within the project root, project-find-file would be >> way faster. Happy to implement this, if it makes sense. > > Yep, that should make sense. Originally the idea was to keep it more > universal so that lists of files coming from the "external roots" > could be handled the same way (used in the two *-or-external-* > commands). > > But indeed it's the relatively rare case, so it'd be better to avoid > paying the performance penalty, especially when the subsequent > handling could do without the added prefix. And even the "rare case" > could be split into separate calls instead of having all files > returned at once. > > My main concern is backward compatibility, so that 3rd party callers > don't break after the update. > > I think there are basically two approaches: > - A new devar like project-use-relative-names, > - Or a new argument for 'project-files', e.g. called RELATIVE. > > Both options are relatively clunky, and the second one might also fail > to work when DIRS is non-nil (or would have to fall back to absolute > names anyway), so I'm leaning toward the first one. It might also > allow certain code to be written supporting both relative and absolute > names (e.g. a process call both binds default-directory to root and > keeps the file names as-is -- the relative ones would be interpreted > as such, the rest just as they are interpreted now). > > Both project-find-file and project-find-regexp should be able to > benefit. Although the former might require a bigger update, given that > the current project-read-file-name-function options don't expect > relative names. Ideally we'd have a smoother migration for custom > p-r-f-n-f functions, but I don't have any good ideas there. I think the defvar approach seems reasonable. The existing project-read-file-name-function certainly don't expect relative names, but they do actually work OK. e.g. (project--read-file-cpd-relative "" '("foo/bar" "foo1/bar") nil 'minibuffer-history) (project--read-file-absolute "" '("foo/bar" "foo1/bar") nil 'minibuffer-history) Both complete fine and return a filename fine. read-file-cpd-relative returns an absolute filename, read-file-absolute reutrns a relative filename. Maybe the same is true for any custom project-read-file-name-functions that exist? Maybe they will just work? >> Another optimization I've considered: We could run the process >> asynchronously so project-files parsing can be parallel with the >> process; but the process is usually very fast anyway, that's not most of >> the overhead, so that won't be a big win. > > Right. This came up in bug#64735, and together with patch in bug#66020 > the asynchronous file listing can run a bit faster than the > synchronous implementation. > > I'm guessing the difference won't be huge in your case, since either > way most time remains spent in Lisp code and GC. But if we take > advantage of this by improving the UIs at the same time, this can be a > real win. Right. > This should go into a separate discussion, I think, but to quickly sum > up my thinking on the subject: > > - Ideally project-files implementations for sync and async UIs should > always look the same. Hopefully the "async" implementation looks the > same or almost the same as the "sync" one. Threads might help. > - project-find-regexp could benefit from this a lot, first by running > the search in parallel to the file listing, and second by showing > the results right away (the current advantage of 'M-x grep'). The > difficult part is have the "async" Xref interface as well (can we do > this without extending the current one? probably not). The UI also > needs to have some "running ..." indicator, as well as a way to > abort the search, killing both processes - that adds requirements to > "async Xref" as well. All seems reasonable. >> However, that would make it easy for project-files as a whole to be >> asynchronous. Then that would allow project-find-file to start the >> listing in the background, and then we'd write a completion table which >> completes only over whatever files we've already read into Emacs. I >> think this would be a lot nicer for most use-cases, and I'd again be >> happy to implement this. > > Could this be that simple? > > Whatever the source of the file listing, as soon as the UI (or > completion styles) calls try-completion or all-completions, the search > has to finish first, shouldn't it? That seems like the semantics of > this API. Or if perhaps we allow it to operate on incomplete results, > how would we indicate to the user at the end that the scan has > finished, and they can press TAB once more to refresh the results? Or > perhaps to be able to find a file they hadn't managed to find in the > incomplete set. > > This seems like it might require both a new UI and an extension of > completion table API. E.g. in certain cases we could say that we only > need N matches, so if the current incomplete set can provide as many, > we don't have to wait until the end. But 'try-completion' would become > unreliable either way. Yes, that's all true, and this is definitely not the intended semantics of the API, but I vaguely suspect it might be fine in practice? That vague suspicion can wait until later, though, because I think the more conservative approach you suggest is also a good improvement on its own. > Even if keeping to the most conservative approach, though, it should > be possible to at least render the prompt before the file listing is > finished. That could make the UI look a bit more responsive. True, that would be pretty nice. And further I suppose in the case of the default completion UI (which doesn't automatically display completions), the user can even type some input before hitting TAB and waiting. Also, I suppose that even non-default completion UIs would allow the user to type input, if the non-default completion UI uses while-no-input. So it would be a pretty responsive experience for such UIs (assuming we are careful in our implementation and don't have bugs when being interrupted). That sounds pretty great, actually. We avoid the blocking part of the UI without needing to think about how to surface "incomplete completion" in all the different completion UIs. From debbugs-submit-bounces@debbugs.gnu.org Sat May 04 20:29:51 2024 Received: (at 69188) by debbugs.gnu.org; 5 May 2024 00:29:51 +0000 Received: from localhost ([127.0.0.1]:56588 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s3PlH-0005OX-CC for submit@debbugs.gnu.org; Sat, 04 May 2024 20:29:51 -0400 Received: from wfout6-smtp.messagingengine.com ([64.147.123.149]:33937) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s3PlB-0005OO-Hj; Sat, 04 May 2024 20:29:49 -0400 Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailfout.west.internal (Postfix) with ESMTP id 90E821C000BB; Sat, 4 May 2024 20:29:15 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute7.internal (MEProxy); Sat, 04 May 2024 20:29:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1714868955; x=1714955355; bh=KtRf4Ouzx6J8NNTDLmDcmHfL09SDMSoCuFh+/D6YN1s=; b= LpDPeLmUXr8c9KsT1n7Zk4a4ySBlyFp6S4if4tBywB0pbMULFii+lkZr/EHgAay6 BfLzaydXebiv1fV15n1PRva7/AMwY9EIrn/LU5fu8VsMsNmFaIDVNKfCi/Fi404Y LYsQpIb8+2khkYEXTCrkI4cDIQJ+abRw0dR3ASNmFyU0a12sUFEKiUDbDX6X1+nX m2UuA91/PBVDJWqSH+5OQrgY9sDQztOMZaRpay/mSMaTxOrDlEmLFdSDNghTTvLh Ug0NRMflQxipkw84vuxvTVScvXg/Z1kP8bHPy3Wmwzg81UC8GuCWsQxSqcvkUyMU okThEIM/xBkz9YnIJgFAwg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1714868955; x= 1714955355; bh=KtRf4Ouzx6J8NNTDLmDcmHfL09SDMSoCuFh+/D6YN1s=; b=F NdCRAPWSbSt82g2PxKcqWL964ihvCGbJ9bf11dS7CsJq2wtOhN7WRhzKrocvtArJ /xGPhZtmeYKxyxtcFoF51+VuL9EbLdVssXOGS8nlJAL4K31OqT/j4IKrtwFGRxLj GjTD9Rlp0ZOF5nq00Yps0wL8e+7h39ulUzzmNbUpqCD242qWn+eu2J0fBQYGzwBH YNehz+En6mEr/nDR+d2faJyry7bwldW/a9kD5/3aGKiZzkg7uuGZAaYlIe+mtG+z EHrNWDzv2XK8IOg7RRr/dNVRB/aZjB6v4NQ/C2oSKuIw80qFJ6Jr7BX8vzssvM8L AzAPRy1hv5+bfGRqZtD+g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvddvfedgfeehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedu jeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 4 May 2024 20:29:13 -0400 (EDT) Message-ID: <2296c5ba-9612-4f17-9246-2dde14a67655@gutov.dev> Date: Sun, 5 May 2024 03:29:12 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#69233: 30.0.50; project-files + project-find-file is slow in large repositories To: Spencer Baugh References: <1b566e9e-eca5-4746-8e31-4155d35ce7a8@gutov.dev> <4e8e8f14-26be-4a50-b47b-a0373ce19b9a@gutov.dev> Content-Language: en-US From: Dmitry Gutov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 69188 Cc: 69233@debbugs.gnu.org, 69188@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) On 29/04/2024 23:27, Spencer Baugh wrote: > Dmitry Gutov writes: >> On 13/04/2024 05:34, Dmitry Gutov wrote: >>> Both options are relatively clunky, and the second one might also >>> fail to work when DIRS is non-nil (or would have to fall back to >>> absolute names anyway), so I'm leaning toward the first one. It >>> might also allow certain code to be written supporting both relative >>> and absolute names (e.g. a process call both binds default-directory >>> to root and keeps the file names as-is -- the relative ones would be >>> interpreted as such, the rest just as they are interpreted now). >> Here's how that change can look. >> >> The patch should demonstrate both the performance improvements for >> project-find-file and project-find-regexp, and some awkwardness in the >> implementation, chiefly due to backward compatibility. >> >> Guess more tests will be required, at the very least. > I see almost a 50% performance improvement with this patch in my large > private repository, once adding support for project-files-relative-names > in my internal project backend. Seems great so far. > > My benchmarking: > > (let ((proj (project-current))) > (list (benchmark-run 10 (let ((project-files-relative-names t)) (length (project-files proj)))) > (benchmark-run 10 (let ((project-files-relative-names nil)) (length (project-files proj)))))) > > ((17.605295389 28 7.647366087000023) > (29.918302167 57 19.246283027999993)) Nice! Too bad it's still takes ~1.7s to list all the files in the project. Well above the comfortable wait time (ideally <100ms or at least <500ms, I guess). From debbugs-submit-bounces@debbugs.gnu.org Sat May 04 23:32:53 2024 Received: (at 69188) by debbugs.gnu.org; 5 May 2024 03:32:53 +0000 Received: from localhost ([127.0.0.1]:57289 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s3ScP-0007ER-5V for submit@debbugs.gnu.org; Sat, 04 May 2024 23:32:53 -0400 Received: from wfhigh2-smtp.messagingengine.com ([64.147.123.153]:39079) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s3ScK-0007EE-AP; Sat, 04 May 2024 23:32:51 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfhigh.west.internal (Postfix) with ESMTP id CD3C718000E4; Sat, 4 May 2024 23:32:17 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Sat, 04 May 2024 23:32:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gutov.dev; h=cc :cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1714879937; x=1714966337; bh=u6UhfW3JE9QRybsw3/JTjHmGPyhf2AnYrTBLf60RwhU=; b= bUHR7LsVOeI1SRgdJ/tEuPlfnQaVrK0DdEEVA57O2j3OAeYvrXo/iHI2HddD+EFR FWohgveFBOgrKfojHQbyufEDNFGnNK6bwYIf0Wy6JV32GoWhxZCnH7Qia/jk7dbk sRqW4mp8WZau7ZjcwWMlPLoo5g+Q0wKvEwuYz2Tir/MVUpY39Pgx+ZEPniK7Pxkw WTYoJXl/jVYw9g3xPYjqZWwo1yfq3B222MVIPsix84fJfIIY2Omw+1J1e1iAjHcD zkHb2yMXJhCHEIv94z4RwPh4uqxk7t222gF6by1M9ccnmBu0Sfo2zHd7/9hJM55W kvnPFRGdNf80MG5XqBdyCA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1714879937; x= 1714966337; bh=u6UhfW3JE9QRybsw3/JTjHmGPyhf2AnYrTBLf60RwhU=; b=j AUQIwFv5iAFHgah7xLQC0xHlKl7WkEtT4vrdSRtGYBBjghryvypd7ZmQTOv3P+wB Iu8Rr3wE5OGYifAmbaioXP+D+CtdKKDMuCt4uk3pu5nh2IjcTvGG7iI/TfGk7KyB Ey7NfE8Qnzn+M2yD5R2SUW4V/GSVyHzpENos1c08ByGNJJGCkf6ca2H1Y+w88SK8 bcXOnOwLn06I4L+Qid3MeKa3bR5hKUZkohPj7jCXi2tj5V10MVhdfj/2PmgYEv01 mj0nRSk4EDjuN+gPZyHqxBrT8gBslxzF5H7r/cY8jfJvjmCmEHaGSyPnNF2ptdb1 U58rZmYr1cOgNK3RmW/ew== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvddvfedgjeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeffmhhi thhrhicuifhuthhovhcuoegumhhithhrhiesghhuthhovhdruggvvheqnecuggftrfgrth htvghrnhepteduleejgeehtefgheegjeekueehvdevieekueeftddvtdevfefhvdevgedu jeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepug hmihhtrhihsehguhhtohhvrdguvghv X-ME-Proxy: Feedback-ID: i0e71465a:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 4 May 2024 23:32:15 -0400 (EDT) Message-ID: Date: Sun, 5 May 2024 06:32:12 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#69233: 30.0.50; project-files + project-find-file is slow in large repositories To: Spencer Baugh References: <1b566e9e-eca5-4746-8e31-4155d35ce7a8@gutov.dev> Content-Language: en-US From: Dmitry Gutov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 69188 Cc: 69233@debbugs.gnu.org, 69188@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) On 30/04/2024 00:04, Spencer Baugh wrote: > Oh, interesting, I see roughly the same result. > > Benchmarking with: > (benchmark-run 10 (project-files (project-current))) > > Running in my long-lived existing Emacs 29 session: > Old: > (4.434228319 14 2.850654906999921) > New: > (4.983809167 16 3.2989908669999295) > > In Emacs 29 emacs -Q: > Old: > (3.5112438729999997 130 1.9230644630000002) > New: > (3.819248509 171 2.309731412) > > But, in Emacs 30 emacs -Q: > Old: > (7.949549188 65 3.3445626799999992) > New: > (7.270785783999999 87 4.0610532379999995) > > So... the performance improvement seems highly unreliable. Probably not > worth changing this area, then - the other patch to allow relative files > will probably be more worth it. All right then, let's hold off on this potential change for now, and maybe revisit it later. Maybe the new GC engine will swing the needle in one or the other direction. > I think the defvar approach seems reasonable. > > The existing project-read-file-name-function certainly don't expect > relative names, but they do actually work OK. e.g. > > (project--read-file-cpd-relative "" '("foo/bar" "foo1/bar") nil 'minibuffer-history) Evaluating this one with the version in master results in Debugger entered--Lisp error: (wrong-type-argument stringp nil) expand-file-name(nil) hence the associated change in the patch. > (project--read-file-absolute "" '("foo/bar" "foo1/bar") nil 'minibuffer-history) No errors here, but two problems are that a) it doesn't show the default-directory [meaning no indication in which project the read is happening], and b) returning the relative name will mess up the file-name-history entry. Good thing you noted the latter, it needs explicit handling. The former can be be shown in the prompt, at least. > Both complete fine and return a filename fine. read-file-cpd-relative > returns an absolute filename, read-file-absolute reutrns a relative > filename. > > Maybe the same is true for any custom project-read-file-name-functions > that exist? Maybe they will just work? So, apparently not. Anyway, I've pushed the patch in commit 370b216f086. Here's hoping the breakage will be minimal. >>> However, that would make it easy for project-files as a whole to be >>> asynchronous. Then that would allow project-find-file to start the >>> listing in the background, and then we'd write a completion table which >>> completes only over whatever files we've already read into Emacs. I >>> think this would be a lot nicer for most use-cases, and I'd again be >>> happy to implement this. >> >> Could this be that simple? >> >> Whatever the source of the file listing, as soon as the UI (or >> completion styles) calls try-completion or all-completions, the search >> has to finish first, shouldn't it? That seems like the semantics of >> this API. Or if perhaps we allow it to operate on incomplete results, >> how would we indicate to the user at the end that the scan has >> finished, and they can press TAB once more to refresh the results? Or >> perhaps to be able to find a file they hadn't managed to find in the >> incomplete set. >> >> This seems like it might require both a new UI and an extension of >> completion table API. E.g. in certain cases we could say that we only >> need N matches, so if the current incomplete set can provide as many, >> we don't have to wait until the end. But 'try-completion' would become >> unreliable either way. > > Yes, that's all true, and this is definitely not the intended semantics > of the API, but I vaguely suspect it might be fine in practice? That > vague suspicion can wait until later, though, because I think the more > conservative approach you suggest is also a good improvement on its own. Some async stuff could make a big improvement on top of it, but it seems to require a fair bit more complexity. >> Even if keeping to the most conservative approach, though, it should >> be possible to at least render the prompt before the file listing is >> finished. That could make the UI look a bit more responsive. > > True, that would be pretty nice. And further I suppose in the case of > the default completion UI (which doesn't automatically display > completions), the user can even type some input before hitting TAB and > waiting. It could be advantageous if the search process starts right when (or before) the prompt is shown, then by the type the first input is entered the search could either be finished or have found some matches at least. > Also, I suppose that even non-default completion UIs would allow the > user to type input, if the non-default completion UI uses > while-no-input. So it would be a pretty responsive experience for such > UIs (assuming we are careful in our implementation and don't have bugs > when being interrupted). Not sure about this one: 1) If you only do the search while the user is not typing, it will finish later compared to the scheme in the previous paragraph. 2) Suppose you type a char, pause, then another one. Will the search start, abort, and then start again? That seems wasteful. I'd ultimately prefer a scheme where work isn't thrown away - but that would require a more complex API. Including a way to abort the background computation (since typing won't do that anymore). For some UIs and commands that makes sense (e.g. incremental interfaces like counsel-rg) because they perform the search with different inputs each time you type a new character. That kinds of works for small-to-medium projects, and you can enjoy the responsiveness of the process. I'm not sure about this approach for big projects.