Package: emacs;
Reported by: "Stefan-W. Hahn" <stefan.hahn <at> s-hahn.de>
Date: Fri, 21 Feb 2014 12:18:01 UTC
Severity: important
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Eli Zaretskii <eliz <at> gnu.org> To: "Stefan-W. Hahn" <stefan.hahn <at> s-hahn.de>, Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: 16830 <at> debbugs.gnu.org Subject: bug#16830: [Bug] 24.3.50; massive slow down in forward-line Date: Mon, 10 Mar 2014 20:58:22 +0200
> Date: Sat, 22 Feb 2014 13:27:47 +0100 > From: "Stefan-W. Hahn" <stefan.hahn <at> s-hahn.de> > Cc: 16830 <at> debbugs.gnu.org > > my original org-mode file: > emacs 24.3.50.1 2.065 sec (cache-long-scans nil, set local) > emacs 24.3.1 0.722 sec (cache-long-line-scan nil; as it was) > > test-neuter.org: > emacs 24.3.50.1 0.925 sec (cache-long-scans nil, set local) > emacs 24.3.1 0.381 sec (cache-long-line-scan nil; as it was) So I think it's a very Good Thing that we have this bug report, because looking into this issue produced a surprising (for me) discovery: turning off the newline cache makes forward-line significantly (5 to 20 times, depending on the file) faster than when the cache is turned on, for files with relatively short lines. It looks like the GCC implementation of memchr, used by the "dumb loop" in find_newline, is so efficient that it easily outperforms the "smart loop" which uses the cache, unless lines are very long, like at least 400 characters, at which point the code performs with and without the cache at the same speed. This striking difference in speed goes mostly unnoticed, because typically Emacs seldom if ever calls forward-line too much (see below for how much should "too much" for this to become visible). However, typing "C-c C-c" in the OP's Org file does just that -- it causes forward-line be invoked a huge number of times, because it walks the (29K line) Org file one line at a time with this function: (defsubst org-goto-line (N) (save-restriction (widen) (goto-char (point-min)) (forward-line (1- N)))) IOW, to move from line N to line N+1, this goes back to the beginning, and then traverses all the N lines again, plus one more line. Not a very efficient way, to put it mildly. So I suggest that Org developers look into making this use case more efficient, no matter whether the changes suggested below are or aren't installed. Inspired by the above function, I profiled find_newline, which is the workhorse of forward-line, using xdisp.c as the test file and this silly program: (let ((n 1)) (while (not (eobp)) (goto-char 1) (forward-line n) (setq n (1+ n)))) Running this program on xdisp.c calls find_newline 30K times and exercises its inner loop, which looks for the next newline, 450 million times. On my machine and in an optimized build, this takes about 1 min 14 sec with the cache turned on, and only 12 sec with it turned off, a factor of 6. By careful optimization of find_newline, I have succeeded to slash the time of the above loop by a factor of 2, see the proposed patch below. The "C-c C-c" command in the OP's Org file runs 3 times faster with those changes, and takes only 2 sec instead of 6.5. This still leaves the no-cache operation faster by a factor of about 3, though, in both these test cases. Again, this is for files whose average line length is 30 to 50 characters. For files whose lines are at least 10 times longer, the times with and without cache become almost identical, and for longer lines the cache starts to win. It would be nice to be able to turn the cache on and off dynamically, depending on the actual line length of the buffer. I tried to implement this, but my naive implementation didn't work well, because sampling of the lines tends to be extremely un-representative. If someone can come up with a smarter implementation, please show it. Until we can dynamically estimate the line length and turn the cache on only for long lines, I suggest to leave the default ON, and install the patches below. My reasoning is that in most situations the slow-down is negligible, while for very long lines the speedup can be significant. For the record, here are the measurements I made, before and after the changes, with 2 test cases: xdisp.c scanning with the above program, and the "neutered" Org file posted by Stefan-W. Hahn: Test Code Optimized? Cache ON Cache OFF --------------------------------------------------- Org old NO 11.3s 0.8s Org new NO 3.6s 0.8s Org old YES 6.5s 0.3s Org new YES 2.0s 0.25s xdisp.c old NO 2m11.8s 14.7s xdisp.c new NO 1m03.3s 14.8s xdisp.c old YES 1m14.4s 12.0s xdisp.c new YES 32.5s 11.8s And here are the patches I propose. (Note that I only handled the forward scan; the backward scan is used much less, so I left it alone, but if someone thinks the asymmetry might be confusing, I can do the same surgery with backward scan.) Any objections to committing this? --- src/search.c.~2~ 2014-01-02 07:07:04.000000000 +0200 +++ src/search.c 2014-03-10 19:40:08.607562800 +0200 @@ -715,18 +715,61 @@ find_newline (ptrdiff_t start, ptrdiff_t examine. */ ptrdiff_t tem, ceiling_byte = end_byte - 1; - /* If we're looking for a newline, consult the newline cache - to see where we can avoid some scanning. */ + /* If we're using the newline cache, consult it to see whether + we can avoid some scanning. */ if (newline_cache) { ptrdiff_t next_change; + int result = 1; + immediate_quit = 0; - while (region_cache_forward - (cache_buffer, newline_cache, start, &next_change)) - start = next_change; - immediate_quit = allow_quit; + while (start < end && result) + { + ptrdiff_t lim1; - start_byte = CHAR_TO_BYTE (start); + result = region_cache_forward (cache_buffer, newline_cache, + start, &next_change); + if (result) + { + start = next_change; + lim1 = next_change = end; + } + else + lim1 = min (next_change, end); + + /* The cache returned zero for this region; see if + this is because the region is known and includes + only newlines. While at that, count any newlines + we bump into, and exit if we found enough off them. */ + start_byte = CHAR_TO_BYTE (start); + while (start < lim1 + && FETCH_BYTE (start_byte) == '\n') + { + start_byte++; + start++; + if (--count == 0) + { + if (bytepos) + *bytepos = start_byte; + return start; + } + } + /* If we found a non-newline character before hitting + position where the cache will again return non-zero + (i.e. no newlines beyond that position), it means + this region is not yet known to the cache, and we + must resort to the "dumb loop" method. */ + if (start < next_change && !result) + break; + result = 1; + } + if (start >= end) + { + start = end; + start_byte = end_byte; + break; + } + immediate_quit = allow_quit; /* START should never be after END. */ if (start_byte > ceiling_byte) @@ -762,9 +805,9 @@ find_newline (ptrdiff_t start, ptrdiff_t unsigned char *nl = memchr (lim_addr + cursor, '\n', - cursor); next = nl ? nl - lim_addr : 0; - /* If we're looking for newlines, cache the fact that - this line's region is free of them. */ - if (newline_cache) + /* If we're using the newline cache, cache the fact that + the region we just traversed is free of newlines. */ + if (newline_cache && cursor != next) { know_region_cache (cache_buffer, newline_cache, BYTE_TO_CHAR (lim_byte + cursor), @@ -840,7 +883,7 @@ find_newline (ptrdiff_t start, ptrdiff_t /* If we're looking for newlines, cache the fact that this line's region is free of them. */ - if (newline_cache) + if (newline_cache && cursor != prev + 1) { know_region_cache (cache_buffer, newline_cache, BYTE_TO_CHAR (ceiling_byte + prev + 1),
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.