GNU bug report logs -
#44983
Truncate long lines of grep output
Previous Next
Reported by: Juri Linkov <juri <at> linkov.net>
Date: Tue, 1 Dec 2020 08:56:01 UTC
Severity: normal
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
>>> Alternatively, xref--collect-matches-1 could apply the limit itself, no
>>> matter whether grep or rg is used. And it could make sure to only do that
>>> after the last match. This might be the slower option, but hard to say in
>>> advance, some comparison benchmark could help here.
>> I think until a long string is inserted to the buffer, truncating the
>> string in the variable in xref--collect-matches-1 should be much faster.
>
> It would surely be faster, but how would that overhead compare to the
> whole operation?
>
> Could be negligible, except in the most extreme cases. After all, the main
> slowdown factor with long strings is the display engine, and it won't be in
> play there.
>
> The upside is we'd be able to support column limiting with Grep too. Which
> is the default configuration. And we'd extract the cutoff column into
> a more visible user option.
This is exactly what we need. After that this bug report/feature request
can be closed.
BTW, for sorting currently xref-search-program-alist uses:
"| sort -t: -k1,1 -k2n,2"
but fortunately ripgrep has a special option to do the same with:
"--sort path"
>>> That aside, could you explain the difference between the regexps? Do grep
>>> and rg use different colors or something like that? Ideally, of course,
>>> that would be just 1 regexp (if that's possible without loss in
>>> performance, or significant loss in clarify).
>> They should be merged into one regexp indeed. Because after customizing
>> it
>> to the rg regexp, grep output doesn't highlight matches anymore (I use both
>> grep and rg interchangeably by different commands).
>> Currently their separate regexps are:
>> grep:
>> "\033\\[0?1;31m
>> \\(.*?\\)
>> \033\\[[0-9]*m"
>> rg:
>> "\033\\[[0-9]*m
>> \033\\[[0-9]*1m
>> \033\\[[0-9]*1m
>> \\(.*?\\)
>> \033\\[[0-9]*0m"
>> That could be combined into one regexp:
>> "\033\\[[0-9?;]*m
>> \\(?:\033\\[[0-9]*1m\\)\\{0,2\\}
>> \\(.*?\\)
>> \033\\[[0-9]*0?m"
>
> Makes sense. Is the parsing performance the same?
Performance is not a problem. The problem is that more lax regexp
causes more false positives. So the above regexp highlighted even
the separator colons (':') between file names and column numbers.
BTW, it's possible to see all highlighted parts of the output
by changing the argument 'MODE' of 'compilation-start' in 'grep'
from #'grep-mode to t (so it uses comint-mode in grep buffers).
Anyway, I found the shortest change needed to support ripgrep,
and pushed to master.
> Also, with the increased complexity, I'd rather we added a couple of tests,
> or a comment with output examples. Or maybe both.
Fortunately, we have all possible cases listed in etc/grep.txt,
so it was easy to check if everything is highlighted correctly now.
Also I added ripgrep samples to etc/grep.txt.
This bug report was last modified 3 years and 19 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.