#40407 - [PATCH] slow ENCODE_FILE and DECODE_FILE

GNU bug report logs - #40407
[PATCH] slow ENCODE_FILE and DECODE_FILE

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 3 Apr 2020 16:11:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Message #8 received at 40407 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org> To: Mattias Engdegård <mattiase <at> acm.org> Cc: 40407 <at> debbugs.gnu.org Subject: Re: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Date: Fri, 03 Apr 2020 19:24:09 +0300

> From: Mattias Engdegård <mattiase <at> acm.org> > Date: Fri, 3 Apr 2020 16:18:43 +0200 > > ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate copious amounts of memory, to the point that they often turn up in both memory and cpu profiles. (This is on macOS; I haven't checked the situation elsewhere.) AFAIR, on macOS the situation is worse than elsewhere, because of the normalization thing. > For instance, a single call to file-relative-name, with ASCII-only arguments, manages to allocate 140 KiB. There are several conversion steps each involving creating temporary buffers as well as the compilation and execution of very large "quick-check" regexps. Example: > > (progn > (require 'profiler) > (profiler-reset) > (garbage-collect) > (profiler-start 'mem) > (file-relative-name "abc") > (profiler-stop) > (profiler-report)) Can you tell more about the conversion steps and the memory each one allocates? > Perhaps we can assume that file names codings are always ASCII-compatible I don't think every encoding is ASCII compatible, so I don't see how we can assume that in general. But the check whether an encoding is ASCII-compatible takes a negligible amount of time, so why bother with such an assumption? > There is already a hack in encode_file_name that assumes that no unibyte string ever needs encoding; if so, the shortcut could perhaps be extended to decode_file_name and simplified. I'm not sure I understand what you mean by extending the shortcut to decode_file_name. Please elaborate. > - if (BUFFERP (dst_object)) > + if (EQ (dst_object, Qt)) > + { > + /* Fast path for ASCII-only input and an ASCII-compatible coding: > + act as identity. */ > + Lisp_Object attrs = CODING_ID_ATTRS (coding.id); > + if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs)) > + && (STRING_MULTIBYTE (string) > + ? (chars == bytes) : string_ascii_p (string))) > + return string; I don't think we can return the same string if NOCOPY is non-zero. The callers might not expect that, and you might inadvertently cause the original string be modified behind the caller's back. But if NOCOPY is 'false', I think this change is OK. Just make sure the test suite doesn't start failing, maybe there's something else we are missing. Thanks.

This bug report was last modified 5 years and 61 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #40407 [PATCH] slow ENCODE_FILE and DECODE_FILE

GNU bug report logs - #40407
[PATCH] slow ENCODE_FILE and DECODE_FILE