GNU bug report logs -
#31796
26.1; dired-do-find-regexp-and-replace fails to find multiline regexps
Previous Next
To reply to this bug, email your comments to 31796 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 12 Jun 2018 07:56:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Žygimantas Bruzgys <me <at> zygi.xyz>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Tue, 12 Jun 2018 07:56:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
1) Create ~/test with a file with following contents:
multi
line
2) Visit directory using dired: C-f ~/test
3) Initiate regexp-replace by hitting Q
4) multi[[:space:]]line RET singeline RET
5) See that dired regexp replace failed reporting that no results were
found
6) Visit a file you have just created.
7) Initiate query-replace-regexp with C-M-%
8) Accept the suggested (previous) query-replace by hitting RET
9) See that the query is actually correct and finds the result.
In GNU Emacs 26.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.30)
of 2018-05-29 built on juergen
Windowing system distributor 'The X.Org Foundation', version 11.0.12000000
Recent messages:
user-error: No matches for: multi[[:space:]]line
Mark set
Replaced 1 occurrence
Configured using:
'configure --prefix=/usr --sysconfdir=/etc --libexecdir=/usr/lib
--localstatedir=/var --with-x-toolkit=gtk3 --with-xft --with-modules
'CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong
-fno-plt' CPPFLAGS=-D_FORTIFY_SOURCE=2
LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now'
Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS NOTIFY
ACL GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS
GTK3 X11 MODULES THREADS LIBSYSTEMD LCMS2
Important settings:
value of $LC_COLLATE: de_CH.UTF-8
value of $LC_MONETARY: de_CH.UTF-8
value of $LC_NUMERIC: de_CH.UTF-8
value of $LC_TIME: lt_LT.UTF-8
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Dired by name
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
buffer-read-only: t
column-number-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message rmc puny seq format-spec rfc822
mml mml-sec password-cache epa derived epg epg-config gnus-util rmail
rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils cl-extra help-mode easymenu find-dired
semantic/fw mode-local find-func xref cl-seq project eieio byte-opt
bytecomp byte-compile cconv eieio-core cl-macs gv eieio-loaddefs grep
compile comint ansi-color ring thingatpt dired-aux cl-loaddefs cl-lib
dired dired-loaddefs elec-pair leuven-theme time-date mule-util tooltip
eldoc electric uniquify ediff-hook vc-hooks lisp-float-type mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode elisp-mode
lisp-mode prog-mode register page menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote dbusbind inotify lcms2 dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)
Memory information:
((conses 16 123229 14374)
(symbols 48 22491 4)
(miscs 40 103 176)
(strings 32 34463 1497)
(string-bytes 1 948663)
(vectors 16 16681)
(vector-slots 8 526030 13780)
(floats 8 78 135)
(intervals 56 300 0)
(buffers 992 14))
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 12 Jun 2018 10:18:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 31796 <at> debbugs.gnu.org (full text, mbox):
Žygimantas Bruzgys <me <at> zygi.xyz> writes:
> 1) Create ~/test with a file with following contents:
> multi
> line
>
> 2) Visit directory using dired: C-f ~/test
> 3) Initiate regexp-replace by hitting Q
> 4) multi[[:space:]]line RET singeline RET
> 5) See that dired regexp replace failed reporting that no results were
> found
> 6) Visit a file you have just created.
> 7) Initiate query-replace-regexp with C-M-%
> 8) Accept the suggested (previous) query-replace by hitting RET
> 9) See that the query is actually correct and finds the result.
As the docstring of dired-do-find-regexp-and-replace says:
REGEXP should use constructs supported by your local ‘grep’ command.
grep matches single lines, so multiline matching won't work.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 23 Nov 2020 09:10:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 31796 <at> debbugs.gnu.org (full text, mbox):
The dired-do-find-regexp-and-replace command does not seem to parse the
regex entered by the user correctly. If the regex string contains a
newline character (^Q^J), it seems that the parsing stops there. At
least I have seen errors like "unmatched bracket" and the like.
Anyhow, I did not get it to replace multiline text. I found an answer
here:
https://emacs.stackexchange.com/questions/30437/dired-search-and-replace-is-throwing-no-results
The solution is to manually invoke dired-do-query-replace-regexp
(instead of pressing just Q).
However, this solution is hard to discover, because it is unexpected
that the official regex-replace feature (key Q) contains such a blunder.
- Why isn't the more robust
dired-do-query-replace-regexp
bound to Q?
- Why not fix the bug in dired-do-find-regexp-and-replace? It has been
reported for version 26 already, and it is not a minor issue. Replacing
interactively in several files is an **extremely** useful feature, and I
would not want to do something like that outside of emacs.
Thanks for all the good work going into emacs.
Best,
Andreas
P.S.: Your approach to issue tracking (by email) must be considered
stone-age by now. How about switching to GitHub / GitLab or the like?
(Unless you want to keep the bar up, of course. But this is hardly in
the spirit of open source.)
--
Andreas Abel <>< Du bist der geliebte Mensch.
Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden
andreas.abel <at> gu.se
http://www.cse.chalmers.se/~abela/
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 23 Nov 2020 15:24:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> From: Andreas Abel <abela <at> chalmers.se>
> Date: Mon, 23 Nov 2020 10:09:38 +0100
>
> - Why not fix the bug in dired-do-find-regexp-and-replace? It has been
> reported for version 26 already, and it is not a minor issue.
I think we'd love to fix this, but we don't know how. Patches are
welcome.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 23 Nov 2020 16:19:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> The dired-do-find-regexp-and-replace command does not seem
> to parse the regex entered by the user correctly. If the
> regex string contains a newline character (^Q^J), it seems
> that the parsing stops there. At least I have seen errors
> like "unmatched bracket" and the like.
>
> Anyhow, I did not get it to replace multiline text.
> I found an answer here:
>
> https://emacs.stackexchange.com/questions/30437/dired-search-and-replace-is-throwing-no-results
>
> The solution is to manually invoke
> dired-do-query-replace-regexp (instead of pressing just Q).
>
> However, this solution is hard to discover, because it is
> unexpected that the official regex-replace feature (key Q)
> contains such a blunder.
>
> - Why isn't the more robust dired-do-query-replace-regexp
> bound to Q?
It _was_ bound to `Q' - for decades. But the inventor
of `dired-do-find-regexp-and-replace' decided to give
that binding to his command. (I argued in vain in
favor of giving the new command a different binding,
keeping `Q' as it was. Similarly for `A'.)
> - Why not fix the bug in dired-do-find-regexp-and-replace?
> It has been reported for version 26 already, and it is not
> a minor issue. Replacing interactively in several files is
> an **extremely** useful feature, and I would not want to
> do something like that outside of emacs.
+1.
___
FWIW, Dired+ binds `dired-do-query-replace-regexp'
to `M-q' (respecting the new binding of `Q' to
`dired-do-find-regexp-and-replace', though I
disagree with it). And Dired+ has both commands
on the menus:
Multiple > Search >
Query Replace Using TAGS Table... M-q
Query Replace Using `find'... Q
https://www.emacswiki.org/emacs/DiredPlus
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 23 Nov 2020 21:23:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 23.11.2020 18:16, Drew Adams wrote:
> But the inventor
> of `dired-do-find-regexp-and-replace' decided to give
> that binding to his command.
It wasn't me who made this decision.
> (I argued in vain in
> favor of giving the new command a different binding,
> keeping `Q' as it was. Similarly for `A'.)
...but there would be no reason for me to write it, if that was the
change proposed.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 23 Nov 2020 21:27:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 12.06.2018 13:17, Noam Postavsky wrote:
> As the docstring of dired-do-find-regexp-and-replace says:
>
> REGEXP should use constructs supported by your local ‘grep’ command.
>
> grep matches single lines, so multiline matching won't work.
*Apparently* 'grep -P -z' can do multiline matches. But I don't know how
portable that is, and the grep manual calls this combination "experimental".
But if we can, and if we change grep-regexp-alist somehow to support
\0-delimited results (-P without -z doesn't do multiline),
xref-matches-in-files could use these flags and get multiline results.
[[:space:]] still wouldn't work, though: it's an Emacs-only extension.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 23 Nov 2020 21:29:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 23.11.2020 11:09, Andreas Abel wrote:
> - Why isn't the more robust
>
> dired-do-query-replace-regexp
>
> bound to Q?
Which is the "more robust", though? dired-do-query-replace-regexp
doesn't work with Tramp. dired-do-find-regexp-and-replace does.
And even if the former is fixed to work, the latter will work much
faster remotely. It's also going to be faster in many "local" cases too.
If we don't manage to find a portable enough solution to do multiline
searches, we could at least warn the user interactively about
unsupported features, though.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 23 Nov 2020 23:50:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 31796 <at> debbugs.gnu.org (full text, mbox):
With a software as old as emacs the most important feature is
1. backwards-compatibility
The second most important feature is
2. backwards-compatibility
The third most important feature is
3. backwards-compatibility
It is like with C and LaTeX. If you cannot ensure that things keep
working as they did, don't change anything.
Tramp? I had to google this term.
How often do programmers work on their local files in their day-to-day
business, how often with remote files via tramp?
If you contribute a new feature for 0.1% percent of the use cases but
disrupt something (even minor) for 99.9% of the use cases, then with an
old tool like emacs the choice is: don't replace the old functionality
with your new functionality.
Just don't break things. Please.
If you want fancy functionality that works with remote files, this is
fine. There are enough keys on the keyboard you can bind the new
functionality to.
Please don't break things that worked.
There are gazillion emacs users out there that dread each new emacs
version because it will break their setup, their workflows, their
habits. We do not want to spend days after upgrades to get our work
environment back.
We value stability and conservativity over everything else.
Thanks to everyone who contributes to emacs. --Andreas
On 2020-11-23 22:28, Dmitry Gutov wrote:
> On 23.11.2020 11:09, Andreas Abel wrote:
>> - Why isn't the more robust
>>
>> dired-do-query-replace-regexp
>>
>> bound to Q?
>
> Which is the "more robust", though? dired-do-query-replace-regexp
> doesn't work with Tramp. dired-do-find-regexp-and-replace does.
>
> And even if the former is fixed to work, the latter will work much
> faster remotely. It's also going to be faster in many "local" cases too.
>
> If we don't manage to find a portable enough solution to do multiline
> searches, we could at least warn the user interactively about
> unsupported features, though.
--
Andreas Abel <>< Du bist der geliebte Mensch.
Department of Computer Science and Engineering
Chalmers and Gothenburg University, Sweden
andreas.abel <at> gu.se
http://www.cse.chalmers.se/~abela/
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 00:14:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 24.11.2020 01:49, Andreas Abel wrote:
> With a software as old as emacs the most important feature is
>
> 1. backwards-compatibility
>
> The second most important feature is
>
> 2. backwards-compatibility
>
> The third most important feature is
>
> 3. backwards-compatibility
No.
That's a road toward irrelevance.
> It is like with C and LaTeX. If you cannot ensure that things keep
> working as they did, don't change anything.
>
> Tramp? I had to google this term.
Tramp has been with us for ~20 years, and ~10 years a part of Emacs. It
has a significant number of users.
Anyway, that Tramp fix was a happy side-effect. Now that I think back,
the main reason was the switch to the new interface which removed the
default binding for tags-loop-continue (now called fileloop-continue).
Which made using dired-do-search a little less convenient, and people
asked for analogous commands which used the xref UI. The original
commands are still with us, though.
> How often do programmers work on their local files in their day-to-day
> business, how often with remote files via tramp?
>
> If you contribute a new feature for 0.1% percent of the use cases but
> disrupt something (even minor) for 99.9% of the use cases, then with an
> old tool like emacs the choice is: don't replace the old functionality
> with your new functionality.
>
> Just don't break things. Please.
I'm sorry for the inconvenience, really. But not being able to break
anything, even, is an ever-growing cost on keeping Emacs relevant toward
contemporary expectations, or otherwise making it better.
> If you want fancy functionality that works with remote files, this is
> fine. There are enough keys on the keyboard you can bind the new
> functionality to.
>
> Please don't break things that worked.
>
> There are gazillion emacs users out there that dread each new emacs
> version because it will break their setup, their workflows, their
> habits. We do not want to spend days after upgrades to get our work
> environment back.
But you still upgrade to the new version? Expecting something new from
it, right?
> We value stability and conservativity over everything else.
And then Emacs users get older, change jobs, or entirely leave the
profession. If Emacs stays as it was 30 years ago, it will appeal only
to users who started with it 30+ years ago. And many of those have
already left.
Emacs users are an admirably faithful bunch, but there are forces of
nature we have to contend with as well.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 01:20:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 24.11.2020 02:13, Dmitry Gutov wrote:
> switch to the new interface which removed the default binding for
^ xref
Specifically, the new bindings for 'M-.' and 'M-,'.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 15:17:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> From: Andreas Abel <abela <at> chalmers.se>
> Date: Tue, 24 Nov 2020 00:49:24 +0100
>
> We value stability and conservativity over everything else.
We do, too. If you stick around for a while, you will see how many
discussions here are due to the determination not to introduce even
the slightest risk of breaking compatibility with existing behavior.
In fact, some of the passion in Dmitry's response wasn't directed at
you, it was directed at myself and other senior maintainers who
frequently object to changes and/or request complicated
backward-compatibility shims, for that very reason.
So please don't assume we don't care about stability, or don't care
enough. It would be simply unfair to make such assumptions. We
certainly don't need lectures about keeping Emacs stable and
compatible.
What you see in this case is not the result of negligence or
carelessness, it is the result of not being aware of this (relatively
rare) use case becoming broken when we changed the UI of this and
similar commands to a more convenient one. It took time for people to
report the problem, and it takes us more time to come up with a good
solution. That's all.
If you have practical ideas for how to support these use cases with
the current command, please describe them. TIA.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 15:45:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 24.11.2020 17:16, Eli Zaretskii wrote:
> In fact, some of the passion in Dmitry's response wasn't directed at
> you, it was directed at myself and other senior maintainers who
> frequently object to changes and/or request complicated
> backward-compatibility shims, for that very reason.
In a way, perhaps. Even though I've been on the other side of these
discussions as well.
But I was mostly pointing out a logical incompatibility to a user who
installs a new release, but doesn't want to see anything change, ever.
> So please don't assume we don't care about stability, or don't care
> enough. It would be simply unfair to make such assumptions. We
> certainly don't need lectures about keeping Emacs stable and
> compatible.
That's true.
> What you see in this case is not the result of negligence or
> carelessness, it is the result of not being aware of this (relatively
> rare) use case becoming broken when we changed the UI of this and
> similar commands to a more convenient one. It took time for people to
> report the problem, and it takes us more time to come up with a good
> solution. That's all.
We've been aware of it for at least two years now. So what are we, then,
negligent, careless, or incompetent?
If you're saying we can't afford to break even a minor feature like
this, I don't think there are a lot of options.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 16:36:01 GMT)
Full text and
rfc822 format available.
Message #44 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Cc: 31796 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Tue, 24 Nov 2020 17:43:53 +0200
>
> We've been aware of it for at least two years now. So what are we, then,
> negligent, careless, or incompetent?
Busy. That, and the fact that no one came up with a clear idea of how
to fix this (at least IIRC).
> If you're saying we can't afford to break even a minor feature like
> this, I don't think there are a lot of options.
We should try not to break any features, yes. AFAIK, no one has yet
claimed that this cannot be fixed. So the decision whether we can or
cannot stay with this broken doesn't have to be made yet.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 19:30:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Multiple > Search >
> Query Replace Using TAGS Table... M-q
> Query Replace Using `find'... Q
dired-do-find-regexp-and-replace could be left bound to Q, but
dired-do-query-replace-regexp could be bound to M-% in Dired.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 19:30:03 GMT)
Full text and
rfc822 format available.
Message #50 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> dired-do-query-replace-regexp doesn't work with Tramp.
Really? I checked it and see no problems.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 19:40:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 24.11.2020 21:29, Juri Linkov wrote:
>> dired-do-query-replace-regexp doesn't work with Tramp.
> Really? I checked it and see no problems
Sorry, a clarification: it doesn't work on directories.
Which seems to be a conscious choice because with how dired-do-search
and dired-do-query-replace-regexp are implementented, it would take a
lot of time even when there are not too many files in such a directory.
It has to copy each file to the local machine before doing the search.
dired-do-find-regexp and dired-do-find-regexp-and-replace handle
directories just fine, however.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 19:44:02 GMT)
Full text and
rfc822 format available.
Message #56 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 24.11.2020 18:35, Eli Zaretskii wrote:
>> Cc: 31796 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>> Date: Tue, 24 Nov 2020 17:43:53 +0200
>>
>> We've been aware of it for at least two years now. So what are we, then,
>> negligent, careless, or incompetent?
>
> Busy. That, and the fact that no one came up with a clear idea of how
> to fix this (at least IIRC).
How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ?
Someone more familiar with existing ports of Grep on different systems
should weigh in on it.
>> If you're saying we can't afford to break even a minor feature like
>> this, I don't think there are a lot of options.
>
> We should try not to break any features, yes.
That's just common sense.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 20:13:01 GMT)
Full text and
rfc822 format available.
Message #59 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> > Multiple > Search >
> > Query Replace Using TAGS Table... M-q
> > Query Replace Using `find'... Q
>
> dired-do-find-regexp-and-replace could be left bound to Q, but
> dired-do-query-replace-regexp could be bound to M-% in Dired.
For the latter, I use `M-q' (not `M-%').
I suggest that vanilla Emacs do the same.
These two commands have quite similar purposes.
I suggest that they have similar keys.
Also, `M-%' has its normal meaning when Dired
has been toggled to writable (WDired). That
key should be kept for its normal purpose, IMO.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 20:17:01 GMT)
Full text and
rfc822 format available.
Message #62 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Tue, 24 Nov 2020 21:43:22 +0200
>
> How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ?
The idea sounds fine to me.
> Someone more familiar with existing ports of Grep on different systems
> should weigh in on it.
I don't think it's necessary. We just need to probe Grep for support
of these switches, and then use it. The result cannot be worse than
it is now.
> >> If you're saying we can't afford to break even a minor feature like
> >> this, I don't think there are a lot of options.
> >
> > We should try not to break any features, yes.
>
> That's just common sense.
Of course.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 20:20:02 GMT)
Full text and
rfc822 format available.
Message #65 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Date: Tue, 24 Nov 2020 21:28:29 +0200
> Cc: Andreas Abel <abela <at> chalmers.se>, 31796 <at> debbugs.gnu.org
>
> > Multiple > Search >
> > Query Replace Using TAGS Table... M-q
> > Query Replace Using `find'... Q
>
> dired-do-find-regexp-and-replace could be left bound to Q, but
> dired-do-query-replace-regexp could be bound to M-% in Dired.
How will this help when the command to continue the loop is not bound
to any key?
We didn't just change the binding of Q without a good reason.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 20:35:02 GMT)
Full text and
rfc822 format available.
Message #68 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> > > Multiple > Search >
> > > Query Replace Using TAGS Table... M-q
> > > Query Replace Using `find'... Q
> >
> > dired-do-find-regexp-and-replace could be left bound to Q, but
> > dired-do-query-replace-regexp could be bound to M-% in Dired.
>
> How will this help when the command to continue the loop is not bound
> to any key?
I don't understand the question. And which
command? Are you asking how to use `M-q'
(`dired-do-query-replace-regexp')?
Are you saying that even though Emacs has kept
`dired-do-query-replace-regexp' it's no longer
usable for some reason?
> We didn't just change the binding of Q without a good reason.
So you say. I've already disagreed that the
reason given was a good one. IMHO, the new
command should have been given a new key.
Regardless of whether the existing key `Q'
should have been usurped, its previous command
still exists, and it seems to still be usable
and useful. If so, what is wrong with giving
it its own key binding (`M-q' in my case)?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 20:41:01 GMT)
Full text and
rfc822 format available.
Message #71 received at 31796 <at> debbugs.gnu.org (full text, mbox):
>> dired-do-find-regexp-and-replace could be left bound to Q, but
>> dired-do-query-replace-regexp could be bound to M-% in Dired.
>
> How will this help when the command to continue the loop is not bound
> to any key?
dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
keys and automatically moves to the next file on multiple files.
So it seemes it doesn't need a key to continue the loop.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 20:52:01 GMT)
Full text and
rfc822 format available.
Message #74 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> >> dired-do-find-regexp-and-replace could be left bound to Q, but
> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> >
> > How will this help when the command to continue the loop is not bound
> > to any key?
>
> dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
> keys and automatically moves to the next file on multiple files.
> So it seemes it doesn't need a key to continue the loop.
Yes. (Now I understand the question. Thx.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 21:09:02 GMT)
Full text and
rfc822 format available.
Message #77 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: drew.adams <at> oracle.com, abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> Date: Tue, 24 Nov 2020 22:31:55 +0200
>
> >> dired-do-find-regexp-and-replace could be left bound to Q, but
> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> >
> > How will this help when the command to continue the loop is not bound
> > to any key?
>
> dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
> keys and automatically moves to the next file on multiple files.
> So it seemes it doesn't need a key to continue the loop.
AFAIR, it does need a way to continue the loop if the user exits the
loop.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 24 Nov 2020 21:38:02 GMT)
Full text and
rfc822 format available.
Message #80 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> > >> dired-do-find-regexp-and-replace could be left bound to Q, but
> > >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> > >
> > > How will this help when the command to continue the loop is not
> > > bound to any key?
> >
> > dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
> > keys and automatically moves to the next file on multiple files.
> > So it seemes it doesn't need a key to continue the loop.
>
> AFAIR, it does need a way to continue the loop if the user exits the
> loop.
If that feature is needed and broken, then that's true
for the command itself (`dired-do-query-replace-regexp'),
right?
It has nothing to do with whether or not that command
has a key binding, and even less to do with whether it
has the key binding `Q'. No?
I guess you're (not saying but hinting?) that the
decision to take key `Q' away from that command also
took away the ability to continue the loop if the user
exits it. If so, that too is (apparently) unfortunate.
But what does that have to do with giving that command a
key binding (e.g. `M-q')?
What am I missing?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 25 Nov 2020 07:55:02 GMT)
Full text and
rfc822 format available.
Message #83 received at 31796 <at> debbugs.gnu.org (full text, mbox):
>> >> dired-do-find-regexp-and-replace could be left bound to Q, but
>> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
>> >
>> > How will this help when the command to continue the loop is not bound
>> > to any key?
>>
>> dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
>> keys and automatically moves to the next file on multiple files.
>> So it seemes it doesn't need a key to continue the loop.
>
> AFAIR, it does need a way to continue the loop if the user exits the
> loop.
Where would the users get the idea that it's possible to interrupt
query-replace and resume it anytime later, if single-file query-replace
doesn't support this feature? I can't find where this feature
of continuing the loop is documented. (info "(emacs) Query Replace")
only says:
To restart a ‘query-replace’ once it is exited, use ‘C-x <ESC>
<ESC>’, which repeats the ‘query-replace’ because it used the minibuffer
to read its arguments. *Note C-x <ESC> <ESC>: Repetition.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 25 Nov 2020 07:55:02 GMT)
Full text and
rfc822 format available.
Message #86 received at 31796 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>> > Multiple > Search >
>> > Query Replace Using TAGS Table... M-q
>> > Query Replace Using `find'... Q
>>
>> dired-do-find-regexp-and-replace could be left bound to Q, but
>> dired-do-query-replace-regexp could be bound to M-% in Dired.
>
> For the latter, I use `M-q' (not `M-%').
> I suggest that vanilla Emacs do the same.
>
> These two commands have quite similar purposes.
> I suggest that they have similar keys.
>
> Also, `M-%' has its normal meaning when Dired
> has been toggled to writable (WDired). That
> key should be kept for its normal purpose, IMO.
'M-q' has its normal meaning of filling the paragraph,
so it would be confusing to use other meaning in Dired.
While finding a good short key would be nice, here is a patch
that for consistency with 'M-s a M-C-s' also adds 'M-s a M-C-%':
[query-replace-regexp.patch (text/x-diff, inline)]
diff --git a/lisp/vc/vc-dir.el b/lisp/vc/vc-dir.el
index cdf8ab984e..bed779104c 100644
--- a/lisp/vc/vc-dir.el
+++ b/lisp/vc/vc-dir.el
@@ -308,6 +308,7 @@ vc-dir-mode-map
(define-key map "Q" 'vc-dir-query-replace-regexp)
(define-key map (kbd "M-s a C-s") 'vc-dir-isearch)
(define-key map (kbd "M-s a M-C-s") 'vc-dir-isearch-regexp)
+ (define-key map (kbd "M-s a M-C-%") 'vc-dir-query-replace-regexp)
(define-key map "G" 'vc-dir-ignore)
(let ((branch-map (make-sparse-keymap)))
diff --git a/lisp/dired.el b/lisp/dired.el
index 08b19a0225..6cbcc17852 100644
--- a/lisp/dired.el
+++ b/lisp/dired.el
@@ -1932,6 +1932,7 @@ dired-mode-map
;; isearch
(define-key map (kbd "M-s a C-s") 'dired-do-isearch)
(define-key map (kbd "M-s a M-C-s") 'dired-do-isearch-regexp)
+ (define-key map (kbd "M-s a M-C-%") 'dired-do-query-replace-regexp)
(define-key map (kbd "M-s f C-s") 'dired-isearch-filenames)
(define-key map (kbd "M-s f M-C-s") 'dired-isearch-filenames-regexp)
;; misc
@@ -2214,9 +2215,12 @@ dired-mode-map
(define-key map [menu-bar operate dashes-3]
'("--"))
- (define-key map [menu-bar operate query-replace]
- '(menu-item "Query Replace in Files..." dired-do-find-regexp-and-replace
- :help "Replace regexp matches in marked files"))
+ (define-key map [menu-bar operate find-regexp-and-replace]
+ '(menu-item "Replace Regexp in Files..." dired-do-find-regexp-and-replace
+ :help "Replace regexp matches in marked files"))
+ (define-key map [menu-bar operate query-replace-regexp]
+ '(menu-item "Query Replace in Files..." dired-do-query-replace-regexp
+ :help "Replace regexp matches in marked files"))
(define-key map [menu-bar operate search]
'(menu-item "Search Files..." dired-do-find-regexp
:help "Search marked files for matches of regexp"))
diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index a395453491..7b8dcc2096 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -598,7 +598,7 @@ project-prefix-map
(define-key map "p" 'project-switch-project)
(define-key map "g" 'project-find-regexp)
(define-key map "G" 'project-or-external-find-regexp)
- (define-key map "r" 'project-query-replace-regexp)
+ (define-key map [?\C-\M-%] 'project-query-replace-regexp)
map)
"Keymap for project commands.")
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 25 Nov 2020 15:49:02 GMT)
Full text and
rfc822 format available.
Message #89 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: drew.adams <at> oracle.com, abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> Date: Wed, 25 Nov 2020 09:28:57 +0200
>
> >> >> dired-do-find-regexp-and-replace could be left bound to Q, but
> >> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> >> >
> >> > How will this help when the command to continue the loop is not bound
> >> > to any key?
> >>
> >> dired-do-query-replace-regexp works like normal 'M-%' with 'y/n/!'
> >> keys and automatically moves to the next file on multiple files.
> >> So it seemes it doesn't need a key to continue the loop.
> >
> > AFAIR, it does need a way to continue the loop if the user exits the
> > loop.
>
> Where would the users get the idea that it's possible to interrupt
> query-replace and resume it anytime later, if single-file query-replace
> doesn't support this feature?
In the manual. And in their muscle memory: we are talking about users
who knew about the original binding of Q in Dired, so we should assume
they also know about the possibility of exiting the loop and then
resuming it.
the command that was previously bound to Q used the UI that is very
similar to find-tag: you are presented with the first hit, and then go
to the next one, and the one after it, etc. "Exiting the loop" can be
as simple as moving point or switching to another buffer to consult
some other part of Emacs. It is very natural. Once you've done that,
you'd want to resume the loop.
> I can't find where this feature of continuing the loop is
> documented. (info "(emacs) Query Replace") only says:
>
> To restart a ‘query-replace’ once it is exited, use ‘C-x <ESC>
> <ESC>’, which repeats the ‘query-replace’ because it used the minibuffer
> to read its arguments. *Note C-x <ESC> <ESC>: Repetition.
Wrong part of the manual, and the text which described that was
removed from the manual when we changed the binding. Visit the Emacs
24 manual and go to "Operating on Files", a section of the "Dired"
chapter. There you will see this text:
`Q REGEXP <RET> TO <RET>'
Perform `query-replace-regexp' on each of the specified files,
replacing matches for REGEXP with the string TO
(`dired-do-query-replace-regexp').
This command is a variant of `tags-query-replace'. If you exit the
query replace loop, you can use `M-,' to resume the scan and
replace more matches. *Note Tags Search::.
The new UI presents all the hits in a separate window, so you can
easily use that to go to any hit you want even if you exit the loop.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 25 Nov 2020 17:39:01 GMT)
Full text and
rfc822 format available.
Message #92 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> >> > Multiple > Search >
> >> > Query Replace Using TAGS Table... M-q
> >> > Query Replace Using `find'... Q
> >>
> >> dired-do-find-regexp-and-replace could be left bound to Q, but
> >> dired-do-query-replace-regexp could be bound to M-% in Dired.
> >
> > For the latter, I use `M-q' (not `M-%').
> > I suggest that vanilla Emacs do the same.
> >
> > These two commands have quite similar purposes.
> > I suggest that they have similar keys.
> >
> > Also, `M-%' has its normal meaning when Dired
> > has been toggled to writable (WDired). That
> > key should be kept for its normal purpose, IMO.
>
> 'M-q' has its normal meaning of filling the paragraph,
> so it would be confusing to use other meaning in Dired.
How do you think filling a paragraph is useful
in Dired (or WDired)? I don't follow you, here.
> While finding a good short key would be nice, here is a patch
> that for consistency with 'M-s a M-C-s' also adds 'M-s a M-C-%':
Count me out as favorable for that suggestion.
(Just one opinion.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 25 Nov 2020 20:26:01 GMT)
Full text and
rfc822 format available.
Message #95 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> the command that was previously bound to Q used the UI that is very
> similar to find-tag: you are presented with the first hit, and then go
> to the next one, and the one after it, etc. "Exiting the loop" can be
> as simple as moving point or switching to another buffer to consult
> some other part of Emacs. It is very natural. Once you've done that,
> you'd want to resume the loop.
Would adding `M-s a M-C-%' help users who want the old behavior back?
Or a keybinding for `fileloop-continue' is needed as well?
> This command is a variant of `tags-query-replace'. If you exit the
> query replace loop, you can use `M-,' to resume the scan and
> replace more matches. *Note Tags Search::.
Maybe `M-s M-,' is not bad for `fileloop-continue'?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 25 Nov 2020 20:31:01 GMT)
Full text and
rfc822 format available.
Message #98 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: drew.adams <at> oracle.com, abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> Date: Wed, 25 Nov 2020 22:18:28 +0200
>
> > the command that was previously bound to Q used the UI that is very
> > similar to find-tag: you are presented with the first hit, and then go
> > to the next one, and the one after it, etc. "Exiting the loop" can be
> > as simple as moving point or switching to another buffer to consult
> > some other part of Emacs. It is very natural. Once you've done that,
> > you'd want to resume the loop.
>
> Would adding `M-s a M-C-%' help users who want the old behavior back?
> Or a keybinding for `fileloop-continue' is needed as well?
I'd prefer not to add fileloop-continue back in any shape or form.
I'd like us to fix the current binding of Q so that it supports
everything the previous command did. Bringing back the commands we
obsoleted is counter-productive.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Sun, 29 Nov 2020 02:31:01 GMT)
Full text and
rfc822 format available.
Message #101 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 25.11.2020 22:30, Eli Zaretskii wrote:
> I'd like us to fix the current binding of Q so that it supports
> everything the previous command did.
Just how much of "everything" are we talking about?
For instance, a number of character classes in Emacs regexps are
dependent on the syntax table. Like [:word:], for instance.
Even [:space:] is dependent on syntax, while it matches a fixed set of
characters in Grep. So when searching across different file types we
can't even "expand" such constructs into concrete characters to search for.
One approach I've considered is replacing such unsupported constructs
with '.', or removing them entirely for constructs like \< and \_<. And
then post-filter the resulting matches in Emacs.
For example, xref-references-in-directory uses a special case of this
approach. In the general case though, I worry users would sometimes
create regexps that result in an exponentially slow or just match-all
regexp being passed to Grep, which would never finish, for no obvious
reason.
Someone should try it, but it's a fair amount of work to handle all
supported constructs, and to catch all (most?) the regexps which we
can't support in this mode.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Sun, 29 Nov 2020 15:23:02 GMT)
Full text and
rfc822 format available.
Message #104 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Sun, 29 Nov 2020 04:30:14 +0200
>
> For instance, a number of character classes in Emacs regexps are
> dependent on the syntax table. Like [:word:], for instance.
>
> Even [:space:] is dependent on syntax, while it matches a fixed set of
> characters in Grep. So when searching across different file types we
> can't even "expand" such constructs into concrete characters to search for.
It isn't clear to me which interpretation users will want. I don't
think there's a single answer.
> Someone should try it, but it's a fair amount of work to handle all
> supported constructs, and to catch all (most?) the regexps which we
> can't support in this mode.
FWIW, I think this is much less important than the embedded newline
support.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 30 Nov 2020 02:26:02 GMT)
Full text and
rfc822 format available.
Message #107 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 24.11.2020 22:16, Eli Zaretskii wrote:
>> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>> Date: Tue, 24 Nov 2020 21:43:22 +0200
>>
>> How about https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31796#23 ?
>
> The idea sounds fine to me.
>
>> Someone more familiar with existing ports of Grep on different systems
>> should weigh in on it.
>
> I don't think it's necessary. We just need to probe Grep for support
> of these switches, and then use it. The result cannot be worse than
> it is now.
Now that I've dug in a little, the situation seems difficult.
-Pz does work, but it forces Grep to consider the file as one long
string. As a consequence, if we ask it to output the line number, the
number will always be 1. That's not a helpful mode of operation.
Even if it worked differently, -P imposes a significant performance
penalty from what I see, even when the extra syntax is not actually
used. So we couldn't enable it by default.
There is a similar program called pcregrep which outputs in the expected
format:
$ pcregrep -MHn "names\"\n *" lisp/progmodes/project.el
lisp/progmodes/project.el:772: :type '(choice (const :tag "Read with
completion from relative names"
project--read-file-cpd-relative)
lisp/progmodes/project.el:774: (const :tag "Read with
completion from absolute names"
project--read-file-absolute)
...but it doesn't seem to have a way to reliably detect where a match
result ends. When we're talking multiline, perhaps the searched file
includes a string like "file-name/etc:number"? Some of our tests
probably do. Grep has an flag -Z (or --null) which adds a null byte
after file names, but pcregrep doesn't.
And anyway, pcregrep isn't usually installed by default.
ripgrep, OTOH, seems to combine both good features here:
$ rg -Hn --multiline --null "names\"\n *" lisp/progmodes/project.el
lisp/progmodes/project.el772: :type '(choice (const :tag "Read with
completion from relative names"
773: project--read-file-cpd-relative)
774: (const :tag "Read with completion from absolute names"
775: project--read-file-absolute)
And it also disables the multiline mode automatically if the regexp
can't match a newline (the multiline mode is significantly slower).
To sum up, there are options, but I don't see a working solution that is
based on GNU Grep. And that's the most portable search program we have,
I think.
The other recommendations I see (here:
https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines)
include bespoke scripts in sed or perl in command mode. These seem less
portable, but if someone would like to try their hand at one that would
also output file names and line numbers in the expected format, I'd be
happy to benchmark it.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 30 Nov 2020 08:56:01 GMT)
Full text and
rfc822 format available.
Message #110 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Now that I've dug in a little, the situation seems difficult.
>
> -Pz does work, but it forces Grep to consider the file as one long
> string. As a consequence, if we ask it to output the line number, the
> number will always be 1. That's not a helpful mode of operation.
>
> Even if it worked differently, -P imposes a significant performance penalty
> from what I see, even when the extra syntax is not actually used. So we
> couldn't enable it by default.
When a grep input pattern contains a newline, then xref could use
the same algorithm as is used for 'M-.', i.e. run 'grep -Pzl'
to get the file names that contain the pattern, then return
these file names without line numbers. This works exactly
like a new feature of extending xref-show-xrefs-function
with a new completion function was proposed recently on emacs-devel
(BTW, why it's not installed yet?)
So like this feature presenting such completions without line numbers:
lisp/progmodes/project.el:(cl-defgeneric project-root)
lisp/progmodes/project.el:(cl-defmethod project-root ((project (head transient))))
lisp/progmodes/project.el:(cl-defmethod project-root ((project (head vc))))
xref for grep could work the same way without line numbers:
lisp/progmodes/project.el:names"^Jproject--read-file-cpd-relative)
lisp/progmodes/project.el:names"^Jproject--read-file-absolute)
Then visiting such grep hit should use Emacs search functions
to find the grep hit in the visited file.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 30 Nov 2020 15:31:01 GMT)
Full text and
rfc822 format available.
Message #113 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Mon, 30 Nov 2020 04:25:31 +0200
>
> To sum up, there are options, but I don't see a working solution that is
> based on GNU Grep. And that's the most portable search program we have,
> I think.
Maybe we should say that if someone wants to be able to find multiline
regexp, they should install ripgrep?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 30 Nov 2020 15:41:01 GMT)
Full text and
rfc822 format available.
Message #116 received at 31796 <at> debbugs.gnu.org (full text, mbox):
* Eli Zaretskii <eliz <at> gnu.org> [2020-11-30 18:31]:
> > Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> > From: Dmitry Gutov <dgutov <at> yandex.ru>
> > Date: Mon, 30 Nov 2020 04:25:31 +0200
> >
> > To sum up, there are options, but I don't see a working solution that is
> > based on GNU Grep. And that's the most portable search program we have,
> > I think.
>
> Maybe we should say that if someone wants to be able to find multiline
> regexp, they should install ripgrep?
Does this help?
https://stackoverflow.com/questions/3717772/regex-grep-for-multi-line-search-needed#7167115
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 30 Nov 2020 15:46:02 GMT)
Full text and
rfc822 format available.
Message #119 received at 31796 <at> debbugs.gnu.org (full text, mbox):
* Eli Zaretskii <eliz <at> gnu.org> [2020-11-30 18:31]:
> > Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> > From: Dmitry Gutov <dgutov <at> yandex.ru>
> > Date: Mon, 30 Nov 2020 04:25:31 +0200
> >
> > To sum up, there are options, but I don't see a working solution that is
> > based on GNU Grep. And that's the most portable search program we have,
> > I think.
>
> Maybe we should say that if someone wants to be able to find multiline
> regexp, they should install ripgrep?
It is possible to combine with sed:
https://www.gnu.org/software/sed/manual/html_node/Text-search-across-multiple-lines.html
https://www.gnu.org/software/sed/manual/html_node/Multiline-techniques.html#Multiline-techniques
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Mon, 30 Nov 2020 16:37:01 GMT)
Full text and
rfc822 format available.
Message #122 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Date: Mon, 30 Nov 2020 18:39:33 +0300
> From: Jean Louis <bugs <at> gnu.support>
> Cc: Dmitry Gutov <dgutov <at> yandex.ru>, abela <at> chalmers.se,
> 31796 <at> debbugs.gnu.org
>
> * Eli Zaretskii <eliz <at> gnu.org> [2020-11-30 18:31]:
> > > Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> > > From: Dmitry Gutov <dgutov <at> yandex.ru>
> > > Date: Mon, 30 Nov 2020 04:25:31 +0200
> > >
> > > To sum up, there are options, but I don't see a working solution that is
> > > based on GNU Grep. And that's the most portable search program we have,
> > > I think.
> >
> > Maybe we should say that if someone wants to be able to find multiline
> > regexp, they should install ripgrep?
>
> Does this help?
>
> https://stackoverflow.com/questions/3717772/regex-grep-for-multi-line-search-needed#7167115
I think this was already discussed up-thread?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 01 Dec 2020 01:24:02 GMT)
Full text and
rfc822 format available.
Message #125 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 30.11.2020 17:42, Jean Louis wrote:
> It is possible to combine with sed:
> https://www.gnu.org/software/sed/manual/html_node/Text-search-across-multiple-lines.html
>
> https://www.gnu.org/software/sed/manual/html_node/Multiline-techniques.html#Multiline-techniques
It's pretty much Chinese to me, sorry.
Can you write a sed search script like that that outputs in the expected
format?
Meaning,
FILE_NAME\0LINE_NUMBER_1:MATCH_LINE_1
...
LINE_NUMBER_N:MATCH_LINE_N
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 01 Dec 2020 01:25:01 GMT)
Full text and
rfc822 format available.
Message #128 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 30.11.2020 17:30, Eli Zaretskii wrote:
>> Cc:abela <at> chalmers.se,31796 <at> debbugs.gnu.org
>> From: Dmitry Gutov<dgutov <at> yandex.ru>
>> Date: Mon, 30 Nov 2020 04:25:31 +0200
>>
>> To sum up, there are options, but I don't see a working solution that is
>> based on GNU Grep. And that's the most portable search program we have,
>> I think.
> Maybe we should say that if someone wants to be able to find multiline
> regexp, they should install ripgrep?
We could do that, indeed.
Certainly better than not having that feature at all.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 01 Dec 2020 02:22:02 GMT)
Full text and
rfc822 format available.
Message #131 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 30.11.2020 10:49, Juri Linkov wrote:
>> Even if it worked differently, -P imposes a significant performance penalty
>> from what I see, even when the extra syntax is not actually used. So we
>> couldn't enable it by default.
>
> When a grep input pattern contains a newline, then xref could use
> the same algorithm as is used for 'M-.', i.e. run 'grep -Pzl'
> to get the file names that contain the pattern, then return
> these file names without line numbers.
Do you mean the xref items backed by find-func.el? There are a
particular kind of references which are usually unique enough that
special navigation logic can work. It's also implemented this way
because the search can be performed without reading file contents (which
would be slower).
> This works exactly
> like a new feature of extending xref-show-xrefs-function
> with a new completion function was proposed recently on emacs-devel
For Grep results, I think the line number is important because we're
even more likely to have multiple lines with the same contents in one file.
What we *could* do, is run Grep, then take just the list of files names
that it returns, visit them all in Emacs and repeat the search in all of
them. But that would require a more complex abstraction than just
"search command", as well as some juggling of buffers that weren't open
before (we don't want to add more open buffers just because the user has
run a search, right?).
I'm not sure cost/benefit is worth it, but if you'd like to try your
hand at writing it, please go ahead. Just let me add ripgrep support first.
BTW, that approach could fit project-search and
project-query-replace-regexp well, I think. Perhaps the dired-do-*
functions, too. Should improve their performance in a number of scenarios.
> (BTW, why it's not installed yet?)
Waiting for the feedback.
It went through several minor revisions. Do you like the most recent
version? If so, please reply to the message containing it. If you don't,
please also reply and say why.
> So like this feature presenting such completions without line numbers:
>
> lisp/progmodes/project.el:(cl-defgeneric project-root)
> lisp/progmodes/project.el:(cl-defmethod project-root ((project (head transient))))
> lisp/progmodes/project.el:(cl-defmethod project-root ((project (head vc))))
>
> xref for grep could work the same way without line numbers:
>
> lisp/progmodes/project.el:names"^Jproject--read-file-cpd-relative)
> lisp/progmodes/project.el:names"^Jproject--read-file-absolute)
>
> Then visiting such grep hit should use Emacs search functions
> to find the grep hit in the visited file.
These are two substrings inside that file that matched the search
regexp. But there could be substrings in the same file that are equal to
either of these but don't match said regexp, e.g. because they are
preceded or followed by some different contents.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 01 Dec 2020 05:21:01 GMT)
Full text and
rfc822 format available.
Message #134 received at 31796 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> To sum up, there are options, but I don't see a working solution that is
> based on GNU Grep.
Can people think of a new feature that would be easy to add to GNU grep
that would make it easy for Dired to handle all cases correctly?
I don't know what the problem is, but if it has to do with parsing the
grep output, here's an idea: an option to tell GNU grep to use quoting
on file names and the match strings, Perhaps in the same way GNU ls
does.
Another idea is an option to output numerical byte positions in the
file instead of the lines that are matched. Emacs can feed those byte
positions into byte-to-position to convert them into buffer positions.
--
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 01 Dec 2020 08:56:02 GMT)
Full text and
rfc822 format available.
Message #137 received at 31796 <at> debbugs.gnu.org (full text, mbox):
>> When a grep input pattern contains a newline, then xref could use
>> the same algorithm as is used for 'M-.', i.e. run 'grep -Pzl'
>> to get the file names that contain the pattern, then return
>> these file names without line numbers.
>
> Do you mean the xref items backed by find-func.el? There are a particular
> kind of references which are usually unique enough that special navigation
> logic can work. It's also implemented this way because the search can be
> performed without reading file contents (which would be slower).
I meant xref-matches-in-files. It could also use another regexp
for the output of 'grep -Pzo' without line numbers.
>> This works exactly
>> like a new feature of extending xref-show-xrefs-function
>> with a new completion function was proposed recently on emacs-devel
>
> For Grep results, I think the line number is important because we're even
> more likely to have multiple lines with the same contents in one file.
Yes, sometimes this might cause inconvenience when the user wants to visit
the second occurrence of exactly the same line.
> What we *could* do, is run Grep, then take just the list of files names
> that it returns, visit them all in Emacs and repeat the search in all of
> them. But that would require a more complex abstraction than just "search
> command", as well as some juggling of buffers that weren't open before (we
> don't want to add more open buffers just because the user has run a search,
> right?).
dired-do-find-regexp uses 'ignores' to filter out ignored files.
You could add another filter to filter out files without matches
using 'grep -PzL'.
>> (BTW, why it's not installed yet?)
>
> Waiting for the feedback.
>
> It went through several minor revisions. Do you like the most recent
> version? If so, please reply to the message containing it. If you don't,
> please also reply and say why.
I suggest to create a new bug-number for it.
>> So like this feature presenting such completions without line numbers:
>> lisp/progmodes/project.el:(cl-defgeneric project-root)
>> lisp/progmodes/project.el:(cl-defmethod project-root ((project (head transient))))
>> lisp/progmodes/project.el:(cl-defmethod project-root ((project (head vc))))
>> xref for grep could work the same way without line numbers:
>> lisp/progmodes/project.el:names"^Jproject--read-file-cpd-relative)
>> lisp/progmodes/project.el:names"^Jproject--read-file-absolute)
>> Then visiting such grep hit should use Emacs search functions
>> to find the grep hit in the visited file.
>
> These are two substrings inside that file that matched the search
> regexp. But there could be substrings in the same file that are equal to
> either of these but don't match said regexp, e.g. because they are preceded
> or followed by some different contents.
How is this possible? Please show examples.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 01 Dec 2020 08:56:02 GMT)
Full text and
rfc822 format available.
Message #140 received at 31796 <at> debbugs.gnu.org (full text, mbox):
>> It is possible to combine with sed:
>> https://www.gnu.org/software/sed/manual/html_node/Text-search-across-multiple-lines.html
>> https://www.gnu.org/software/sed/manual/html_node/Multiline-techniques.html#Multiline-techniques
>
> It's pretty much Chinese to me, sorry.
When I need to grep in multi-line mode I use Ruby, but its modifiers
differ from Perl:
https://regular-expressions.mobi/ruby.html
/m makes the dot match newlines. Ruby indeed uses /m, whereas Perl and
many other programming languages use /s for “dot matches newlines”.
https://www.regular-expressions.info/modifiers.html
(?s) for “single line mode” makes the dot match all characters,
including line breaks. Not supported by Ruby or JavaScript.
(?m) for “multi-line mode” makes the caret and dollar match at the start
and end of each line in the subject string. In Ruby, (?m) makes the
dot match all characters, without affecting the caret and dollar which
always match at the start and end of each line in Ruby.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 01 Dec 2020 15:21:01 GMT)
Full text and
rfc822 format available.
Message #143 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 01.12.2020 10:36, Juri Linkov wrote:
>>> It is possible to combine with sed:
>>> https://www.gnu.org/software/sed/manual/html_node/Text-search-across-multiple-lines.html
>>> https://www.gnu.org/software/sed/manual/html_node/Multiline-techniques.html#Multiline-techniques
>>
>> It's pretty much Chinese to me, sorry.
>
> When I need to grep in multi-line mode I use Ruby, but its modifiers
> differ from Perl:
>
> https://regular-expressions.mobi/ruby.html
> /m makes the dot match newlines. Ruby indeed uses /m, whereas Perl and
> many other programming languages use /s for “dot matches newlines”.
>
> https://www.regular-expressions.info/modifiers.html
> (?s) for “single line mode” makes the dot match all characters,
> including line breaks. Not supported by Ruby or JavaScript.
> (?m) for “multi-line mode” makes the caret and dollar match at the start
> and end of each line in the subject string. In Ruby, (?m) makes the
> dot match all characters, without affecting the caret and dollar which
> always match at the start and end of each line in Ruby.
Ruby's much easier for me, of course, but it doesn't have the same
advantage of ubiquity that awk (and, to a lesser extent, perl) have.
Either way, someone would need to write that script.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Tue, 01 Dec 2020 15:47:02 GMT)
Full text and
rfc822 format available.
Message #146 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> From: Richard Stallman <rms <at> gnu.org>
> Cc: eliz <at> gnu.org, abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> Date: Tue, 01 Dec 2020 00:20:12 -0500
>
> Can people think of a new feature that would be easy to add to GNU grep
> that would make it easy for Dired to handle all cases correctly?
Yes: it should detect encoding of each input file (and have a way of
letting the user specify encoding for each file), convert the file's
contents to some internal encoding (probably UTF-8), then report the
hits encoded in UTF-8, regardless of the file's original encoding (and
regardless of the current locale's codeset).
> I don't know what the problem is, but if it has to do with parsing the
> grep output, here's an idea: an option to tell GNU grep to use quoting
> on file names and the match strings, Perhaps in the same way GNU ls
> does.
The problem is not with file names, it's with the matches. But since
you mention it: Grep should, in this new mode, report file names also
recoded into UTF-8. In a word, it should arrange for its output be in
a single encoding known in advance, so that front ends like Emacs
won't need to guess the encoding.
> Another idea is an option to output numerical byte positions in the
> file instead of the lines that are matched. Emacs can feed those byte
> positions into byte-to-position to convert them into buffer positions.
AFAIU, there's already such an option: -b. However, byte-to-position
works only with UTF-8 encoded files; we need filepos-to-bufferpos
(which requires to know the file's encoding, so we are back at the
same problem).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 02 Dec 2020 04:27:02 GMT)
Full text and
rfc822 format available.
Message #149 received at 31796 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> AFAIU, there's already such an option: -b. However, byte-to-position
> works only with UTF-8 encoded files; we need filepos-to-bufferpos
Oops.
> (which requires to know the file's encoding, so we are back at the
> same problem).
If you're going to look at the contents of the file, you have to
visit it, which means you'll know which encoding to use for that file.
Does that make it work?
--
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 02 Dec 2020 14:57:01 GMT)
Full text and
rfc822 format available.
Message #152 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> From: Richard Stallman <rms <at> gnu.org>
> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org, dgutov <at> yandex.ru
> Date: Tue, 01 Dec 2020 23:26:10 -0500
>
> > AFAIU, there's already such an option: -b. However, byte-to-position
> > works only with UTF-8 encoded files; we need filepos-to-bufferpos
>
> Oops.
>
> > (which requires to know the file's encoding, so we are back at the
> > same problem).
>
> If you're going to look at the contents of the file, you have to
> visit it, which means you'll know which encoding to use for that file.
The point is that our heuristics for detecting encoding is not
perfect, so it could fail.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 02 Dec 2020 17:18:01 GMT)
Full text and
rfc822 format available.
Message #155 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 02.12.2020 16:56, Eli Zaretskii wrote:
> The point is that our heuristics for detecting encoding is not
> perfect, so it could fail.
Do you imagine Grep could use a more reliable detection algorithm?
Although... since it has to scan the full file anyway, it could first do
a quick detection, and then maybe rescan from the beginning if the
encoding turns out to be something else.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 02 Dec 2020 17:41:02 GMT)
Full text and
rfc822 format available.
Message #158 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Wed, 2 Dec 2020 19:17:06 +0200
>
> On 02.12.2020 16:56, Eli Zaretskii wrote:
> > The point is that our heuristics for detecting encoding is not
> > perfect, so it could fail.
>
> Do you imagine Grep could use a more reliable detection algorithm?
No, I don't. But it could allow the user to specify a different
encoding for each file, as in
grep --encoding=FOO FILES1* --encoding=BAR FILES2*
etc. And even if it just did the job of the same quality as we do, it
will do it faster, which is why we use Grep in the first place, right?
The important part of the "enhancement" I described is actually the
fact that the output gets encoded in a single encoding, no matter what
was the encoding of the original files. This makes reading and
decoding the output simple and always correct.
> Although... since it has to scan the full file anyway, it could first do
> a quick detection, and then maybe rescan from the beginning if the
> encoding turns out to be something else.
That'd be too late, as some matches were already output.
Grep does begin by scanning a small portion of the file (at least it
did, back when I was familiar with its code), so detection in the same
style as Emacs does should be a natural addition, I think.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 02 Dec 2020 17:45:01 GMT)
Full text and
rfc822 format available.
Message #161 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 02.12.2020 19:39, Eli Zaretskii wrote:
>> Cc: abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>> Date: Wed, 2 Dec 2020 19:17:06 +0200
>>
>> On 02.12.2020 16:56, Eli Zaretskii wrote:
>>> The point is that our heuristics for detecting encoding is not
>>> perfect, so it could fail.
>>
>> Do you imagine Grep could use a more reliable detection algorithm?
>
> No, I don't. But it could allow the user to specify a different
> encoding for each file, as in
>
> grep --encoding=FOO FILES1* --encoding=BAR FILES2*
Not sure we can call it like that in an automated fashion (i.e. in
project-find-regexp). But hey, somebody else could.
> etc. And even if it just did the job of the same quality as we do, it
> will do it faster, which is why we use Grep in the first place, right?
That's true.
> The important part of the "enhancement" I described is actually the
> fact that the output gets encoded in a single encoding, no matter what
> was the encoding of the original files. This makes reading and
> decoding the output simple and always correct.
Yes, OK.
>> Although... since it has to scan the full file anyway, it could first do
>> a quick detection, and then maybe rescan from the beginning if the
>> encoding turns out to be something else.
>
> That'd be too late, as some matches were already output.
It could buffer them until the full file has been parsed. Encoding
detection and conversion must add a certain overhead anyway, so I'm not
sure how expensive the extra buffering would be in comparison.
As a bonus, per-file buffering like that would allow easier
parallelization of searches.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 02 Dec 2020 17:49:02 GMT)
Full text and
rfc822 format available.
Message #164 received at 31796 <at> debbugs.gnu.org (full text, mbox):
> Cc: rms <at> gnu.org, abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Wed, 2 Dec 2020 19:43:52 +0200
>
> >> Although... since it has to scan the full file anyway, it could first do
> >> a quick detection, and then maybe rescan from the beginning if the
> >> encoding turns out to be something else.
> >
> > That'd be too late, as some matches were already output.
>
> It could buffer them until the full file has been parsed. Encoding
> detection and conversion must add a certain overhead anyway, so I'm not
> sure how expensive the extra buffering would be in comparison.
>
> As a bonus, per-file buffering like that would allow easier
> parallelization of searches.
Buffering means you don't output matches as soon as you find them,
which might be regarded as a kind of regression -- see Richard's bug
reports a few days ago. And since you never know where in the file
the telltale byte sequences will appear, you will need to always wait
until the entire file is read -- which could be prohibitive for very
large files.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Thu, 03 Dec 2020 02:24:02 GMT)
Full text and
rfc822 format available.
Message #167 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 01.12.2020 07:20, Richard Stallman wrote:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> > To sum up, there are options, but I don't see a working solution that is
> > based on GNU Grep.
>
> Can people think of a new feature that would be easy to add to GNU grep
> that would make it easy for Dired to handle all cases correctly?
>
> I don't know what the problem is, but if it has to do with parsing the
> grep output, here's an idea: an option to tell GNU grep to use quoting
> on file names and the match strings, Perhaps in the same way GNU ls
> does.
Grep already has that, more or less, with --null. pcregrep doesn't
(which was my other example).
What Grep could add, however, is a "multiline" matching mode similar to
what pcregrep and ripgrep have. Meaning, it would allow matches to cross
newlines (with certain rules on whether "." matches a newline) but
without requiring the -z mode. So it would still report correct line
numbers for the matches.
> Another idea is an option to output numerical byte positions in the
> file instead of the lines that are matched. Emacs can feed those byte
> positions into byte-to-position to convert them into buffer positions.
Like Eli said, that's -b.
But considering Emacs would have to visit each file, to post-process the
results with byte-to-position, this might turn out to be not much faster
or easier to implement than simply visiting every file that (according
to Grep) has matches and repeating the search in Emacs.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Thu, 03 Dec 2020 02:47:02 GMT)
Full text and
rfc822 format available.
Message #170 received at 31796 <at> debbugs.gnu.org (full text, mbox):
On 01.12.2020 10:39, Juri Linkov wrote:
>>> When a grep input pattern contains a newline, then xref could use
>>> the same algorithm as is used for 'M-.', i.e. run 'grep -Pzl'
>>> to get the file names that contain the pattern, then return
>>> these file names without line numbers.
>>
>> Do you mean the xref items backed by find-func.el? There are a particular
>> kind of references which are usually unique enough that special navigation
>> logic can work. It's also implemented this way because the search can be
>> performed without reading file contents (which would be slower).
>
> I meant xref-matches-in-files.
'M-.' doesn't use xref-matches-in-files.
> It could also use another regexp
> for the output of 'grep -Pzo' without line numbers.
Not 100% sure I understand you here, but hopefully this line of
discussion is continued below.
>>> This works exactly
>>> like a new feature of extending xref-show-xrefs-function
>>> with a new completion function was proposed recently on emacs-devel
>>
>> For Grep results, I think the line number is important because we're even
>> more likely to have multiple lines with the same contents in one file.
>
> Yes, sometimes this might cause inconvenience when the user wants to visit
> the second occurrence of exactly the same line.
Or 5th or 10th. Where this would be more important, though, is when the
user will want to change all these lines at once with
xref-query-replace-in-results.
Also, it'd probably be surprising to see Grep search results without
line numbers.
>> What we *could* do, is run Grep, then take just the list of files names
>> that it returns, visit them all in Emacs and repeat the search in all of
>> them. But that would require a more complex abstraction than just "search
>> command", as well as some juggling of buffers that weren't open before (we
>> don't want to add more open buffers just because the user has run a search,
>> right?).
>
> dired-do-find-regexp uses 'ignores' to filter out ignored files.
> You could add another filter to filter out files without matches
> using 'grep -PzL'.
Right. This is sorta a backup plan. Although, when the number of files
to search can be counted on one hand, there's nothing too bad in doing
the search in Emacs.
>>> (BTW, why it's not installed yet?)
>>
>> Waiting for the feedback.
>>
>> It went through several minor revisions. Do you like the most recent
>> version? If so, please reply to the message containing it. If you don't,
>> please also reply and say why.
>
> I suggest to create a new bug-number for it.
If you think it's best. The original thread author decided to write to
emacs-devel, maybe they're more comfortable there. *shrug*
>>> So like this feature presenting such completions without line numbers:
>>> lisp/progmodes/project.el:(cl-defgeneric project-root)
>>> lisp/progmodes/project.el:(cl-defmethod project-root ((project (head transient))))
>>> lisp/progmodes/project.el:(cl-defmethod project-root ((project (head vc))))
>>> xref for grep could work the same way without line numbers:
>>> lisp/progmodes/project.el:names"^Jproject--read-file-cpd-relative)
>>> lisp/progmodes/project.el:names"^Jproject--read-file-absolute)
>>> Then visiting such grep hit should use Emacs search functions
>>> to find the grep hit in the visited file.
>>
>> These are two substrings inside that file that matched the search
>> regexp. But there could be substrings in the same file that are equal to
>> either of these but don't match said regexp, e.g. because they are preceded
>> or followed by some different contents.
>
> How is this possible? Please show examples.
Hmm, apparently no examples possible with Grep (which treats all lines
as independent strings), but if we take ripgrep, or other regexp
engines, they can use anchors like \A (counterpart to \` in Emacs), or
PCRE's lookahead/lookbehind. As long as dired-do-find-regexp is
documented to simply "use constructs supported by your local [search]
command", the user could take advantage of some advances syntax like that.
Though we might have to limit that capability if the idea of
post-filtering search results using Emacs's own engine comes to life.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Thu, 03 Dec 2020 05:28:02 GMT)
Full text and
rfc822 format available.
Message #173 received at 31796 <at> debbugs.gnu.org (full text, mbox):
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Subject: bug#31796: 27.1;
> dired-do-find-regexp-and-replace fails to find multiline regexps
> Resent-From: Eli Zaretskii <eliz <at> gnu.org>
> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
> Resent-CC: bug-gnu-emacs <at> gnu.org
> Resent-Sender: help-debbugs <at> gnu.org
> To: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Wed, 02 Dec 2020 19:47:43 +0200
> Message-Id: <83wny0f6bk.fsf <at> gnu.org>
> From: Eli Zaretskii <eliz <at> gnu.org>
> In-Reply-To: <0646a65f-db21-b377-6897-caeb6ff3e10c <at> yandex.ru> (message from
> Dmitry Gutov on Wed, 2 Dec 2020 19:43:52 +0200)
> Cc: abela <at> chalmers.se, rms <at> gnu.org, 31796 <at> debbugs.gnu.org
> > Cc: rms <at> gnu.org, abela <at> chalmers.se, 31796 <at> debbugs.gnu.org
> > From: Dmitry Gutov <dgutov <at> yandex.ru>
> > Date: Wed, 2 Dec 2020 19:43:52 +0200
> >
> > >> Although... since it has to scan the full file anyway, it could first do
> > >> a quick detection, and then maybe rescan from the beginning if the
> > >> encoding turns out to be something else.
> > >
> > > That'd be too late, as some matches were already output.
> >
> > It could buffer them until the full file has been parsed. Encoding
> > detection and conversion must add a certain overhead anyway, so I'm not
> > sure how expensive the extra buffering would be in comparison.
> >
> > As a bonus, per-file buffering like that would allow easier
> > parallelization of searches.
> Buffering means you don't output matches as soon as you find them,
> which might be regarded as a kind of regression -- see Richard's bug
> reports a few days ago. And since you never know where in the file
> the telltale byte sequences will appear, you will need to always wait
> until the entire file is read -- which could be prohibitive for very
> large files.
In my case, I was definitely going to wait until the search finished,
to see all the responses.
But it is mudh easier to look at them if they come out one by one,
rather than all at once due to buffering.
--
Dr Richard Stallman
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Sun, 06 Dec 2020 21:17:01 GMT)
Full text and
rfc822 format available.
Message #176 received at 31796 <at> debbugs.gnu.org (full text, mbox):
>> dired-do-find-regexp uses 'ignores' to filter out ignored files.
>> You could add another filter to filter out files without matches
>> using 'grep -PzL'.
>
> Right. This is sorta a backup plan. Although, when the number of files to
> search can be counted on one hand, there's nothing too bad in doing the
> search in Emacs.
Another backup plan is to use ripgrep. Its multiline handling with -U
also allows to search words ignoring any whitespace, even newlines.
This is like isearch-lax-whitespace using search-whitespace-regexp
when it contains a newline, e.g. "[ \t\r\n]+".
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 16 Dec 2020 03:01:01 GMT)
Full text and
rfc822 format available.
Message #179 received at 31796 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 06.12.2020 23:00, Juri Linkov wrote:
>>> dired-do-find-regexp uses 'ignores' to filter out ignored files.
>>> You could add another filter to filter out files without matches
>>> using 'grep -PzL'.
>> Right. This is sorta a backup plan. Although, when the number of files to
>> search can be counted on one hand, there's nothing too bad in doing the
>> search in Emacs.
> Another backup plan is to use ripgrep. Its multiline handling with -U
> also allows to search words ignoring any whitespace, even newlines.
> This is like isearch-lax-whitespace using search-whitespace-regexp
> when it contains a newline, e.g. "[ \t\r\n]+".
Right. It has a problem of its own, though: it still outputs a file name
per line, even when a match is spread across several lines (unlike
pcregrep). So we're left guessing where a given multiline match ends.
Also, 'sort' doesn't seem to be able to treat both : and \0 as
separators at the same time.
Here's a rough patch, for illustration. It's kind of working, but I'm
not loving it.
[ripgrep-multiline.diff (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Wed, 16 Dec 2020 21:06:02 GMT)
Full text and
rfc822 format available.
Message #182 received at 31796 <at> debbugs.gnu.org (full text, mbox):
>> Another backup plan is to use ripgrep. Its multiline handling with -U
>> also allows to search words ignoring any whitespace, even newlines.
>> This is like isearch-lax-whitespace using search-whitespace-regexp
>> when it contains a newline, e.g. "[ \t\r\n]+".
>
> Right. It has a problem of its own, though: it still outputs a file name
> per line, even when a match is spread across several lines (unlike
> pcregrep). So we're left guessing where a given multiline match ends.
>
> Also, 'sort' doesn't seem to be able to treat both : and \0 as separators
> at the same time.
>
> Here's a rough patch, for illustration.
Thanks, now finally it's possible to search text ignoring whitespace
between words, for example:
Find regexp: file[
]+names
finds everything correctly, even though current implementation maybe
not the most elegant.
> It's kind of working, but I'm not loving it.
What do you think about using the option `rg --json`?
Emacs has the fast JSON parsing library now, so using
JSON output would be more reliable.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31796
; Package
emacs
.
(Thu, 17 Dec 2020 00:41:01 GMT)
Full text and
rfc822 format available.
Message #185 received at 31796 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 16.12.2020 22:32, Juri Linkov wrote:
>>> Another backup plan is to use ripgrep. Its multiline handling with -U
>>> also allows to search words ignoring any whitespace, even newlines.
>>> This is like isearch-lax-whitespace using search-whitespace-regexp
>>> when it contains a newline, e.g. "[ \t\r\n]+".
>>
>> Right. It has a problem of its own, though: it still outputs a file name
>> per line, even when a match is spread across several lines (unlike
>> pcregrep). So we're left guessing where a given multiline match ends.
>>
>> Also, 'sort' doesn't seem to be able to treat both : and \0 as separators
>> at the same time.
>>
>> Here's a rough patch, for illustration.
>
> Thanks, now finally it's possible to search text ignoring whitespace
> between words, for example:
>
> Find regexp: file[
> ]+names
>
> finds everything correctly, even though current implementation maybe
> not the most elegant.
>
>> It's kind of working, but I'm not loving it.
>
> What do you think about using the option `rg --json`?
> Emacs has the fast JSON parsing library now, so using
> JSON output would be more reliable.
Very interesting. It returns better data, each multiline match is wholly
in one entry instead of being spread across lines. Even the matches are
annotated with match string/length/absolute position.
We should really investigate it, but perhaps a bit later, including our
capability to parse it quickly when there are a lot of matches (>1000),
how said byte offsets interact with different file encodings.
Also, its output is not one JSON document but a series of them
(including ones with just search statistics which we'll want to skip),
but some re-search-forward followed by (json-parse-buffer) should do the
trick.
In the meantime, here's a smaller patch using the traditional output
format. I figure since there is a file name on each line anyway, --null
doesn't help much. So it can be simplified a little (see attached).
Unfortunately, xref-replace-in-matches is broken for such multiline
matches. And, of course, it merges together matches on adjacent lines,
whether they are one match or several (that hasn't changed from the
previous match). So more investigation is needed.
[ripgrep-multiline.diff (text/x-patch, attachment)]
This bug report was last modified 4 years and 245 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.