GNU bug report logs - #31796
26.1; dired-do-find-regexp-and-replace fails to find multiline regexps

Previous Next

Package: emacs;

Reported by: Žygimantas Bruzgys <me <at> zygi.xyz>

Date: Tue, 12 Jun 2018 07:56:03 UTC

Severity: minor

Found in version 26.1

Full log


View this message in rfc822 format

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Juri Linkov <juri <at> linkov.net>
Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
Subject: bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find multiline regexps
Date: Thu, 17 Dec 2020 02:40:09 +0200
[Message part 1 (text/plain, inline)]
On 16.12.2020 22:32, Juri Linkov wrote:
>>> Another backup plan is to use ripgrep.  Its multiline handling with -U
>>> also allows to search words ignoring any whitespace, even newlines.
>>> This is like isearch-lax-whitespace using search-whitespace-regexp
>>> when it contains a newline, e.g. "[ \t\r\n]+".
>>
>> Right. It has a problem of its own, though: it still outputs a file name
>> per line, even when a match is spread across several lines (unlike
>> pcregrep). So we're left guessing where a given multiline match ends.
>>
>> Also, 'sort' doesn't seem to be able to treat both : and \0 as separators
>> at the same time.
>>
>> Here's a rough patch, for illustration.
> 
> Thanks, now finally it's possible to search text ignoring whitespace
> between words, for example:
> 
>    Find regexp: file[ 	
> ]+names
> 
> finds everything correctly, even though current implementation maybe
> not the most elegant.
> 
>> It's kind of working, but I'm not loving it.
> 
> What do you think about using the option `rg --json`?
> Emacs has the fast JSON parsing library now, so using
> JSON output would be more reliable.

Very interesting. It returns better data, each multiline match is wholly 
in one entry instead of being spread across lines. Even the matches are 
annotated with match string/length/absolute position.

We should really investigate it, but perhaps a bit later, including our 
capability to parse it quickly when there are a lot of matches (>1000), 
how said byte offsets interact with different file encodings.

Also, its output is not one JSON document but a series of them 
(including ones with just search statistics which we'll want to skip), 
but some re-search-forward followed by (json-parse-buffer) should do the 
trick.

In the meantime, here's a smaller patch using the traditional output 
format. I figure since there is a file name on each line anyway, --null 
doesn't help much. So it can be simplified a little (see attached).

Unfortunately, xref-replace-in-matches is broken for such multiline 
matches. And, of course, it merges together matches on adjacent lines, 
whether they are one match or several (that hasn't changed from the 
previous match). So more investigation is needed.
[ripgrep-multiline.diff (text/x-patch, attachment)]

This bug report was last modified 4 years and 246 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.