Package: emacs;
Reported by: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Date: Sat, 19 Apr 2025 16:06:02 UTC
Severity: normal
Found in version 31.0.50
View this message in rfc822 format
From: Gerd Möllmann <gerd.moellmann <at> gmail.com> To: Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: Stefan Kangas <stefankangas <at> gmail.com>, 77924 <at> debbugs.gnu.org Subject: bug#77924: 31.0.50; [Feature branch] Change marker implementation Date: Tue, 22 Apr 2025 16:33:50 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes: >>> - Any chance you could add a comment explaining how you arrived at >>> text_index_interval=4*1024? If it's arbitrarily chosen, adding a >>> comment to that effect might be useful. >> It's neither completely arbitrary nor can I really say much about it. >> INTERVAL / 2 is basically the worst-case distance one has to scan >> through text. Don't really know what to say more, and the above is >> pretty obvious. > > FWIW, the current code in `master` uses a threshold of 5k (sometimes 5k > bytes, sometimes 5k chars) as the minimum distance between > "cache-markers". > > It's a tradeoff between the size of the text-index array (and the time > it takes to populate/update it) and the time it takes to scan the text > from one of the positions recorded in that array to the position we're > actually interested in. > > We haven't really investigated which value would be ideal. True. It might even be different between machines, don't know. I once, in the beginning, tried large values, don't remember, maybe 16K, and performance seemed to drop, but not much. Difficult to find an ideal value. > The code that does the scan can probably be improved significantly > (e.g. by processing the text several bytes at a time). For sure. >> (defun elb-replace-region-contents-entry () >> (with-temp-buffer >> (let ((step (apply #'concat (make-list 2000 "🙂été👶🏿 ")))) >> (dotimes (_ (/ 100000 (length step))) >> (insert step))) >> >> (dotimes (_ 100000) >> (let* ((a (1+ (random (point-max)))) >> (b (1+ (random (point-max)))) >> (beg (min a b)) >> (end (max a b))) >> (replace-region-contents beg end "🙂été👶🏿 🙂été👶🏿 "))))) > > As discussed in the other thread, this performs fairly few > charpos->bytepos conversions, actually. Instead it spends most of its > time inside the `diffseq.h` code. You could try something like: > > (require 'subr-x) > > (defun elb-replace-region-contents-entry () > (with-temp-buffer > (let ((step (apply #'concat (make-list 2000 "🙂été👶🏿 ")))) > (dotimes (_ (/ 500000 (length step))) > (insert step)) > > (dotimes (_ 100000) > (let* ((a (+ (point-min) (random (buffer-size)))) > (b (min (+ a (length step) (random 512) -256) (point-max)))) > (replace-region-contents a b (lambda () step))))))) > > which should spend more time in Emacs's code (especially in > `buffer_chars_equal`) than in `diffseq.h` > >>> PS. BTW, a small procedural thing. Instead of merging master into a >>> scratch/ branch, I recommend deleting the branch, rebasing it on >>> master, and then pushing it again. This way, when we later merge it >>> into master, we avoid the merge commits, and the history is kept >>> clean and more easily reviewable. Not the end of the world either >>> way, but something to consider. >> >> Sorry, too much work :-). >> >> (For reviewing a branch with merges from master I recommend to find the >> latest merge commit, take the parent commit on the master side, and >> range diff with that. (If you have the merge in the reflog, that >> speeds up finding the latest merge, but that's only the case if you did >> the merge in that repo.)) > > git diff origin/master...origin/scratch/text-index > > should take care of it. > > If there's interest, I could take care of rebasing (and improving the > commit messages on the rebased commits) before merging into `master`. Please consider the branch yours, if you like, as far as I am concerned :-). Because... >> I consider this feature done when that is in, and play with arm64 NEON a >> bit instead. In C++ ;-). ...of that- > > BTW, regarding processing several bytes at a time, we could try > something like > > long unsigned eightbytes = BUF_FETCH_8BYTES (bytepos); > long unsigned eightbits = ((~eightbytes) & 0x8080808080808080) >> 7; > long unsigned fourtwobits = eightbits + (eightbits >> 32); > long unsigned twofourbits = fourtwobits + (fourtwobits >> 16); > nchars += (twofourbits + (twofourbits >> 8)) & 0xff; > > Not sure it would be significantly faster than the current code, tho. > Maybe an even easier first change would be to replace > > if (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))) > ++charpos; > > with > > charpos += (~BUF_FETCH_BYTE (b, bytepos)) >> 7; > > so as to avoid a branch that can be sometimes hard to predict. > Hard to tell if that would be faster. Maybe if I know a bit more about the current state of affairs of SIMD support, one could write something using that. That would certainly be faster. On my M1 chips it would mean processing 16 bytes at a time (128 bit registers), and I've seen a presentation that talked about 256 and even 512 bits on some Intel CPUs (AMX).
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.