GNU bug report logs - #31796
26.1; dired-do-find-regexp-and-replace fails to find multiline regexps

Previous Next

Package: emacs;

Reported by: Žygimantas Bruzgys <me <at> zygi.xyz>

Date: Tue, 12 Jun 2018 07:56:03 UTC

Severity: minor

Found in version 26.1

Full log


Message #161 received at 31796 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: abela <at> chalmers.se, rms <at> gnu.org, 31796 <at> debbugs.gnu.org
Subject: Re: bug#31796: 27.1; dired-do-find-regexp-and-replace fails to find
 multiline regexps
Date: Wed, 2 Dec 2020 19:43:52 +0200
On 02.12.2020 19:39, Eli Zaretskii wrote:
>> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>> Date: Wed, 2 Dec 2020 19:17:06 +0200
>>
>> On 02.12.2020 16:56, Eli Zaretskii wrote:
>>> The point is that our heuristics for detecting encoding is not
>>> perfect, so it could fail.
>>
>> Do you imagine Grep could use a more reliable detection algorithm?
> 
> No, I don't.  But it could allow the user to specify a different
> encoding for each file, as in
> 
>     grep --encoding=FOO FILES1* --encoding=BAR FILES2*

Not sure we can call it like that in an automated fashion (i.e. in 
project-find-regexp). But hey, somebody else could.

> etc.  And even if it just did the job of the same quality as we do, it
> will do it faster, which is why we use Grep in the first place, right?

That's true.

> The important part of the "enhancement" I described is actually the
> fact that the output gets encoded in a single encoding, no matter what
> was the encoding of the original files.  This makes reading and
> decoding the output simple and always correct.

Yes, OK.

>> Although... since it has to scan the full file anyway, it could first do
>> a quick detection, and then maybe rescan from the beginning if the
>> encoding turns out to be something else.
> 
> That'd be too late, as some matches were already output.

It could buffer them until the full file has been parsed. Encoding 
detection and conversion must add a certain overhead anyway, so I'm not 
sure how expensive the extra buffering would be in comparison.

As a bonus, per-file buffering like that would allow easier 
parallelization of searches.




This bug report was last modified 4 years and 246 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.