Package: emacs;
Reported by: Ihor Radchenko <yantar92 <at> gmail.com>
Date: Thu, 23 Dec 2021 11:06:01 UTC
Severity: normal
Found in version 29.0.50
Done: Mattias Engdegård <mattiase <at> acm.org>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Ihor Radchenko <yantar92 <at> gmail.com> To: Mattias Engdegård <mattiase <at> acm.org> Cc: 52753 <at> debbugs.gnu.org Subject: bug#52753: 29.0.50; Printing long list-like structures fails Date: Fri, 24 Dec 2021 18:56:19 +0800
Mattias Engdegård <mattiase <at> acm.org> writes: > The elisp printer isn't really geared towards printing deeply nested values since it uses recursion. Perhaps you could just print the contents of your skip list as a flat list or vector for serialization purposes, and rebuild the structure when loading it back in again. Thanks for the suggestion. It is probably what I have to do anyway to support older Emacs. However, it will not solve the problem revealed in this bug report. I understand that the current implementations of the read/prin1 are recursive. Yet, they are able to read ordinary lists in non-recursive way. Skip list is conceptually very similar to ordinary list. Maybe the existing list reader functionality can be somehow adapted to read list-like structures? And more generally, I suspect that M-x memory-report suffers from similar problem with prin1 - treating list-like structure recursively. While you showed how to circumvent the printing/reading of skip lists, I do not see how to fix the memory-report behaviour and the Emacs crash (which, I guess, may also be related to treating list-likes recursively). I would like to highlight the crash reproducer especially. I may accept limitation for printer/reader, but crash cannot be tolerated. <end of on-topic> ------- <begin of off-topic to the bug report> > (I'd use a balanced tree-like structure instead of a skip list. Performance is similar, and allows for immutability and persistence.) This is not directly related to my bug report, but if you have a good knowledge of using data structures in practice, I would appreciate comments. I know that balanced trees have the same asymptotic behaviour with skip lists and better worst-case behaviour. However, the original Pugh's paper introducing skip lists did a comparison between skip list and self-balancing trees. Pugh et al [1] showed that insertion time in skip lists is still O(log N), but with smaller constant. Moreover, he also showed that skip lists can be more performant when the data structure is edited locally [2]. [1] W Pugh [Springer Berlin Heidelberg] (1989) A Probabilistic Alternative to Balanced Trees. Univ https://doi.org/10.1007/3-540-51542-9_36 [2] W. Pugh (1990) A skip list cookbook https://www.semanticscholar.org/paper/A-skip-list-cookbook-Pugh/83ebde23871ce9f839ad82e0247a7481410f994e The problem I am trying to solve is performance of parser cache for Org mode files. The common operations on the cache are: (1) random element lookups (element at point while user edits the Org buffer); (2) iterating cache elements one-by-one to filter them (agenda views, column views, sparse trees, etc); (3) Deleting or inserting a series of adjacent elements upon file modification (normal user edits). (4) Combination of cache iteration with adding/removing elements in-place (agenda lookups when parts of the cache should be calculated as we find that some elements are not yet in cache). The current implementation of org-element-cache is indeed using balanced AVL tree (built-in Emacs implementation). However, given that many cache operations are not random, but involve groups of subsequent elements, I wanted to give skip list a try. The benchmarks (see code below) of skip list vs. AVL-tree give the following results: Skip list: - Insertion of 200k random numbers: 0.80 sec - Lookup of 200k random numbers: 0.73 sec - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.42 sec - Lookup of 200k subsequent numbers: 0.32 sec AVL tree: - Insertion of 200k random numbers: 0.41 sec - Lookup of 200k random numbers: 0.18 sec - Insertion of 200k subsequent numbers (0, 1, 2, ...): 0.44 sec (often triggers garbage collection) - Lookup of 200k subsequent numbers: 0.17 sec My implementation of skip list is comparable for subsequent number queries and 2-4x worse for random queries. A more tricky benchmark is iterating the list/tree and simultaneously altering it (a common operation in agenda/sparse tree code): For both Skip list and AVL tree I first create a 100k numbers (0, 2, 4, ..., 200000) and then iterate over the data structure adding the remaining odd numbers on every iteration. For skip list: Elapsed time: 0.509455s For AVL tree: Elapsed time: 1.348294s This time, the skip list "wins" also requiring a _lot_ less code (see below). Every insertion to AVL tree requires a O(log N) extra comparisons to find the new element and O(log N) extra memory for new stack. Finally, a real-world test for a complex agenda query on a 15Mb Org file (the data is output from elp-results): For skip list: ;; org-element-cache-map 1986 18.579660950 0.0093553176 ;; org-element-cache-map 1986 19.202719912 0.0096690432 For AVL tree: ;; org-element-cache-map 1986 21.047835249 0.0105981043 ;; org-element-cache-map 1986 20.733410823 0.0104397838 AVL tree is 10% slower. Given that most of the CPU time is devoted to actual parsing, 10% overall difference when switching from AVL tree to skip list is very good. In fact, my daily usage of skip-list branch when most of the file elements are already cached feels a lot snappier (though I have no consistent benchmark to test this and may fall into cognitive bias). In summary, skip list appears to give benefit in terms to overall speed (within context of grammar cache). The current implementation of skip list + org-element-cache is slightly faster compared to AVL tree and feels much snappier. However, the difference is indeed just a constant and probably depends on query patterns. If built-in AVL tree is optimised, the performance benefit may disappear (or not, if one optimises my skip list implementation better). The other aspect is ease of usage. AVL tree is trickier to work with (see below code example). Though I do not have enough practical experience with skip lists to know the possible code pitfalls years later. Any commends are welcome. ----- The code: ;; Random queries (require 'org-skip-list) (setq skiplist (org-skip-list-create #'<)) (benchmark-run 200000 (org-skip-list-insert skiplist (random 1000000000))) (benchmark-run 200000 (org-skip-list-find skiplist (random 1000000000))) ;; Ordered queries (require 'org-skip-list) (setq skiplist (org-skip-list-create #'<)) (setq i 0) (benchmark-run 200000 (org-skip-list-insert skiplist i) (cl-incf i)) (setq i 0) (benchmark-run 200000 (org-skip-list-find skiplist i) (cl-incf i)) ;; Random queries (require 'avl-tree) (setq tree (avl-tree-create #'<)) (benchmark-run 200000 (avl-tree-enter tree (random 1000000000))) (benchmark-run 200000 (avl-tree-member-p tree (random 1000000000))) ;; Ordered queries (require 'avl-tree) (setq tree (avl-tree-create #'<)) (setq i 0) (benchmark-run 200000 (avl-tree-enter tree i) (cl-incf i)) (setq i 0) (benchmark-run 200000 (avl-tree-member-p tree i) (cl-incf i)) ;; Modification during iteration (require 'org-skip-list) (setq skiplist (org-skip-list-create #'<)) (let ((i 0)) (while (<= i 200000) (when (= 0 (% i 2)) (org-skip-list-insert skiplist i)) (setq i (+ i 2)))) (benchmark-progn (iter-do (data (org-skip-list-iter skiplist)) (when (< data 200000) (org-skip-list-insert skiplist (1+ data))))) ;; Modification during iteration (require 'avl-tree) (setq tree (avl-tree-create #'<)) (let ((i 0)) (while (<= i 200000) (when (= 0 (% i 2)) (avl-tree-enter tree i)) (setq i (+ i 2)))) (benchmark-progn (let ((node (avl-tree--node-left (avl-tree--dummyroot tree))) data continue-flag (stack (list nil)) (leftp t) (start 0)) (while (and node (<= start 200000)) (setq data (avl-tree--node-data node)) (if (and leftp (avl-tree--node-left node) ; Left branch. ;; ... or when we are before START. (< start data)) (progn (push node stack) (setq node (avl-tree--node-left node))) (unless (< data start) (if (= start data) (cl-incf start) (avl-tree-enter tree start) (setq node (avl-tree--node-left (avl-tree--dummyroot tree)) stack (list nil) leftp t continue-flag t))) (if continue-flag (setq continue-flag nil) (setq node (if (and (car stack) ;; If START advanced beyond stack parent, skip the right branch. (< (avl-tree--node-data (car stack)) start)) (progn (setq leftp nil) (pop stack)) ;; Otherwise, move ahead into the right ;; branch when it exists. (if (setq leftp (avl-tree--node-right node)) (avl-tree--node-right node) (pop stack))))))))) Best, Ihor
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.