GNU bug report logs - #40407
[PATCH] slow ENCODE_FILE and DECODE_FILE

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 3 Apr 2020 16:11:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Mattias Engdegård <mattiase <at> acm.org>
Subject: bug#40407: closed (Re: bug#40407: [PATCH] slow ENCODE_FILE and
 DECODE_FILE)
Date: Sat, 11 Apr 2020 15:10:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE

which was filed against the emacs package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 40407 <at> debbugs.gnu.org.

-- 
40407: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=40407
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Mattias Engdegård <mattiase <at> acm.org>
To: 40407-done <at> debbugs.gnu.org
Cc: Kenichi Handa <handa <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp>
Subject: Re: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Sat, 11 Apr 2020 17:09:27 +0200
I think we are done here -- now that all calls to ENCODE_FILE and DECODE_FILE have been checked to be safe for no-copy semantics, there is no need to copy in the ASCII identity case; pushed to master.


[Message part 3 (message/rfc822, inline)]
From: Mattias Engdegård <mattiase <at> acm.org>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Fri, 3 Apr 2020 16:18:43 +0200
[Message part 4 (text/plain, inline)]
ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate copious amounts of memory, to the point that they often turn up in both memory and cpu profiles. (This is on macOS; I haven't checked the situation elsewhere.)

For instance, a single call to file-relative-name, with ASCII-only arguments, manages to allocate 140 KiB. There are several conversion steps each involving creating temporary buffers as well as the compilation and execution of very large "quick-check" regexps. Example:

(progn
  (require 'profiler)
  (profiler-reset)
  (garbage-collect)
  (profiler-start 'mem)
  (file-relative-name "abc")
  (profiler-stop)
  (profiler-report))

This applies to just about every function dealing with files or file names.

The attached patch is somewhat conservatively written but at least a starting point. It reduces the memory consumption by file-relative-name in the example above to zero. Perhaps we can assume that file names codings are always ASCII-compatible; if so, the shortcut can be taken in encode_file_name and decode_file_name directly.

There is already a hack in encode_file_name that assumes that no unibyte string ever needs encoding; if so, the shortcut could perhaps be extended to decode_file_name and simplified.

[0001-Avoid-expensive-recoding-for-ASCII-identity-cases.patch (application/octet-stream, attachment)]

This bug report was last modified 5 years and 91 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.