GNU bug report logs - #31796
26.1; dired-do-find-regexp-and-replace fails to find multiline regexps

Previous Next

Package: emacs;

Reported by: Žygimantas Bruzgys <me <at> zygi.xyz>

Date: Tue, 12 Jun 2018 07:56:03 UTC

Severity: minor

Found in version 26.1

Full log


Message #107 received at 31796 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
Subject: Re: bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find
 multiline regexps
Date: Mon, 30 Nov 2020 04:25:31 +0200
On 24.11.2020 22:16, Eli Zaretskii wrote:
>> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>> Date: Tue, 24 Nov 2020 21:43:22 +0200
>>
>> How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ?
> 
> The idea sounds fine to me.
> 
>> Someone more familiar with existing ports of Grep on different systems
>> should weigh in on it.
> 
> I don't think it's necessary.  We just need to probe Grep for support
> of these switches, and then use it.  The result cannot be worse than
> it is now.

Now that I've dug in a little, the situation seems difficult.

-Pz does work, but it forces Grep to consider the file as one long 
string. As a consequence, if we ask it to output the line number, the 
number will always be 1. That's not a helpful mode of operation.

Even if it worked differently, -P imposes a significant performance 
penalty from what I see, even when the extra syntax is not actually 
used. So we couldn't enable it by default.

There is a similar program called pcregrep which outputs in the expected 
format:

$ pcregrep -MHn "names\"\n *" lisp/progmodes/project.el
lisp/progmodes/project.el:772:  :type '(choice (const :tag "Read with 
completion from relative names"
                        project--read-file-cpd-relative)
lisp/progmodes/project.el:774:                 (const :tag "Read with 
completion from absolute names"
                        project--read-file-absolute)

...but it doesn't seem to have a way to reliably detect where a match 
result ends. When we're talking multiline, perhaps the searched file 
includes a string like "file-name/etc:number"? Some of our tests 
probably do. Grep has an flag -Z (or --null) which adds a null byte 
after file names, but pcregrep doesn't.

And anyway, pcregrep isn't usually installed by default.

ripgrep, OTOH, seems to combine both good features here:

$ rg -Hn --multiline --null "names\"\n *" lisp/progmodes/project.el
lisp/progmodes/project.el772:  :type '(choice (const :tag "Read with 
completion from relative names"
773:                        project--read-file-cpd-relative)
774:                 (const :tag "Read with completion from absolute names"
775:                        project--read-file-absolute)

And it also disables the multiline mode automatically if the regexp 
can't match a newline (the multiline mode is significantly slower).

To sum up, there are options, but I don't see a working solution that is 
based on GNU Grep. And that's the most portable search program we have, 
I think.

The other recommendations I see (here: 
https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines) 
include bespoke scripts in sed or perl in command mode. These seem less 
portable, but if someone would like to try their hand at one that would 
also output file names and line numbers in the expected format, I'd be 
happy to benchmark it.




This bug report was last modified 4 years and 246 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.