GNU bug report logs - #40407
[PATCH] slow ENCODE_FILE and DECODE_FILE

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 3 Apr 2020 16:11:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 40407 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 40407 <at> debbugs.gnu.org
Subject: Re: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Fri, 03 Apr 2020 19:24:09 +0300
> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Fri, 3 Apr 2020 16:18:43 +0200
> 
> ENCODE_FILE and DECODE_FILE turn out to be surprisingly slow, and allocate copious amounts of memory, to the point that they often turn up in both memory and cpu profiles. (This is on macOS; I haven't checked the situation elsewhere.)

AFAIR, on macOS the situation is worse than elsewhere, because of the
normalization thing.

> For instance, a single call to file-relative-name, with ASCII-only arguments, manages to allocate 140 KiB. There are several conversion steps each involving creating temporary buffers as well as the compilation and execution of very large "quick-check" regexps. Example:
> 
> (progn
>   (require 'profiler)
>   (profiler-reset)
>   (garbage-collect)
>   (profiler-start 'mem)
>   (file-relative-name "abc")
>   (profiler-stop)
>   (profiler-report))

Can you tell more about the conversion steps and the memory each one
allocates?

> Perhaps we can assume that file names codings are always ASCII-compatible

I don't think every encoding is ASCII compatible, so I don't see how
we can assume that in general.  But the check whether an encoding is
ASCII-compatible takes a negligible amount of time, so why bother with
such an assumption?

> There is already a hack in encode_file_name that assumes that no unibyte string ever needs encoding; if so, the shortcut could perhaps be extended to decode_file_name and simplified.

I'm not sure I understand what you mean by extending the shortcut to
decode_file_name.  Please elaborate.

> -  if (BUFFERP (dst_object))
> +  if (EQ (dst_object, Qt))
> +    {
> +      /* Fast path for ASCII-only input and an ASCII-compatible coding:
> +         act as identity.  */
> +      Lisp_Object attrs = CODING_ID_ATTRS (coding.id);
> +      if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs))
> +          && (STRING_MULTIBYTE (string)
> +              ? (chars == bytes) : string_ascii_p (string)))
> +        return string;

I don't think we can return the same string if NOCOPY is non-zero.
The callers might not expect that, and you might inadvertently cause
the original string be modified behind the caller's back.

But if NOCOPY is 'false', I think this change is OK.  Just make sure
the test suite doesn't start failing, maybe there's something else we
are missing.

Thanks.




This bug report was last modified 5 years and 61 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.