GNU bug report logs - #44983
Truncate long lines of grep output

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Tue, 1 Dec 2020 08:56:01 UTC

Severity: normal

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #77 received at 44983 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Juri Linkov <juri <at> linkov.net>
Cc: 44983 <at> debbugs.gnu.org
Subject: Re: bug#44983: Truncate long lines of grep output
Date: Wed, 9 Dec 2020 22:06:01 +0200
On 09.12.2020 21:17, Juri Linkov wrote:
>>> I think until a long string is inserted to the buffer, truncating the
>>> string in the variable in xref--collect-matches-1 should be much faster.
>>
>> It would surely be faster, but how would that overhead compare to the
>> whole operation?
>>
>> Could be negligible, except in the most extreme cases. After all, the main
>> slowdown factor with long strings is the display engine, and it won't be in
>> play there.
>>
>> The upside is we'd be able to support column limiting with Grep too. Which
>> is the default configuration. And we'd extract the cutoff column into
>> a more visible user option.
> 
> This is exactly what we need.  After that this bug report/feature request
> can be closed.

Perhaps you would like to come up with the name for the new user option? 
The changes to xref--collect-matches-1 should be straightforward (it 
will include a choice, though: whether to cut off matches when they 
don't fit). Since you're the one who has experienced poor performance 
because of this, though, you can do the benchmarking. Basically, what we 
need to know is whether the new option indeed makes performance acceptable.

> BTW, for sorting currently xref-search-program-alist uses:
> 
>      "| sort -t: -k1,1 -k2n,2"
> 
> but fortunately ripgrep has a special option to do the same with:
> 
>      "--sort path"

Somehow, that option came out to be consistently slower in my 
benchmarking. Even when the results are only a few lines (that's 
actually when the difference should be most apparent, because with many 
lines Elisp takes up the most of CPU time). You can try it yourself:

(benchmark 10 '(project-find-regexp ":package-version '(xref"))

  0.86 with '| sort'
  1.33 with '--sort path'

$ rg --version
ripgrep 12.1.1 (rev 7cb211378a)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

We can also document it in the docstring, though. For those who don't 
have 'sort' installed.

>>> They should be merged into one regexp indeed.  Because after customizing
>>> it
>>> to the rg regexp, grep output doesn't highlight matches anymore (I use both
>>> grep and rg interchangeably by different commands).
>>> Currently their separate regexps are:
>>> grep:
>>> "\033\\[0?1;31m
>>>    \\(.*?\\)
>>>    \033\\[[0-9]*m"
>>> rg:
>>> "\033\\[[0-9]*m
>>>    \033\\[[0-9]*1m
>>>    \033\\[[0-9]*1m
>>>    \\(.*?\\)
>>>    \033\\[[0-9]*0m"
>>> That could be combined into one regexp:
>>> "\033\\[[0-9?;]*m
>>>    \\(?:\033\\[[0-9]*1m\\)\\{0,2\\}
>>>    \\(.*?\\)
>>>    \033\\[[0-9]*0?m"
>>
>> Makes sense. Is the parsing performance the same?
> 
> Performance is not a problem.  The problem is that more lax regexp
> causes more false positives.  So the above regexp highlighted even
> the separator colons (':') between file names and column numbers.
> 
> BTW, it's possible to see all highlighted parts of the output
> by changing the argument 'MODE' of 'compilation-start' in 'grep'
> from #'grep-mode to t (so it uses comint-mode in grep buffers).

Because ansi-color-process-output is in comint-output-filter-functions?

> Anyway, I found the shortest change needed to support ripgrep,
> and pushed to master.

Excellent.

>> Also, with the increased complexity, I'd rather we added a couple of tests,
>> or a comment with output examples. Or maybe both.
> 
> Fortunately, we have all possible cases listed in etc/grep.txt,
> so it was easy to check if everything is highlighted correctly now.
> Also I added ripgrep samples to etc/grep.txt.

Thanks!




This bug report was last modified 3 years and 18 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.