GNU bug report logs - #31698
27.0; `rx' help: Show equivalent regexp constructs

Previous Next

Package: emacs;

Reported by: Drew Adams <drew.adams <at> oracle.com>

Date: Sun, 3 Jun 2018 17:02:01 UTC

Severity: wishlist

Merged with 36496

Found in version 27.0

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31698 in the body.
You can then email your comments to 31698 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#31698; Package emacs. (Sun, 03 Jun 2018 17:02:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Drew Adams <drew.adams <at> oracle.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 03 Jun 2018 17:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0; `rx' help: Show equivalent regexp constructs
Date: Sun, 3 Jun 2018 10:01:07 -0700 (PDT)
Help for `rx' could use some improvement.

1. There seems to be no other help for `rx' than `C-h f rx'.  Nothing in
   the Elisp manual, for instance.  Perhaps it should have its own
   manual.  Or perhaps it should be documented in the Elisp manual (?).
   It's hard to imagine someone trying to learn the use of `rx' just by
   looking at `C-h f rx'.  Emacs should try to do better.

2. Please document (in the doc string of `rx', if nowhere else) the
   correspondences between each of the `rx' constructs and regexp
   syntax.  At least please document the most important ones.  For
   example, `zero-or-more' presumably corresponds to postfix regexp char
   `*'.

3. Please consider reordering the doc-string text to cover more commonly
   used and more important constructs before those less likely to be
   used.  E.g., `not', `and', and `or', seem more common and more
   important than `category'.

In GNU Emacs 27.0.50 (build 3, x86_64-w64-mingw32)
 of 2018-03-21
Repository revision: e70d0c9e66d7a8609450b2889869d16aeb0363b5
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure --without-dbus --host=x86_64-w64-mingw32
 --without-compress-install -C 'CFLAGS=-O2 -static -g3''




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31698; Package emacs. (Sun, 03 Jun 2018 17:15:02 GMT) Full text and rfc822 format available.

Message #8 received at 31698 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 31698 <at> debbugs.gnu.org
Subject: Re: bug#31698: 27.0; `rx' help: Show equivalent regexp constructs
Date: Sun, 03 Jun 2018 20:14:02 +0300
> Date: Sun, 3 Jun 2018 10:01:07 -0700 (PDT)
> From: Drew Adams <drew.adams <at> oracle.com>
> 
> Help for `rx' could use some improvement.

FWIW, I disagree.  I consider the doc string of 'rx' almost perfect,
it's an example that people should learn from.

> 1. There seems to be no other help for `rx' than `C-h f rx'.  Nothing in
>    the Elisp manual, for instance.  Perhaps it should have its own
>    manual.  Or perhaps it should be documented in the Elisp manual (?).
>    It's hard to imagine someone trying to learn the use of `rx' just by
>    looking at `C-h f rx'.  Emacs should try to do better.

Given it's not-so-widespread use (and even outright critique of its
very raison d'être), I see no need to describe this in the manual.  If
and when its use becomes more widespread, we could consider that.  For
now, it will just bloat the manual.

> 2. Please document (in the doc string of `rx', if nowhere else) the
>    correspondences between each of the `rx' constructs and regexp
>    syntax.  At least please document the most important ones.  For
>    example, `zero-or-more' presumably corresponds to postfix regexp char
>    `*'.

Really?  Doesn't "zero-or-more" define the effect as clearly as
possible?  I think it does.

> 3. Please consider reordering the doc-string text to cover more commonly
>    used and more important constructs before those less likely to be
>    used.  E.g., `not', `and', and `or', seem more common and more
>    important than `category'.

"Important" is in the eyes of the beholder.  I don't see why the
current order is wrong.  If anything, it starts from "atoms" and moves
to "expressions", which is IMO no less important than any other
"importance" grade.

Having said all that, if someone wants to work on this and thinks they
can improve on the current state of affairs, feel free.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31698; Package emacs. (Sun, 03 Jun 2018 17:54:01 GMT) Full text and rfc822 format available.

Message #11 received at 31698 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 31698 <at> debbugs.gnu.org
Subject: RE: bug#31698: 27.0; `rx' help: Show equivalent regexp constructs
Date: Sun, 3 Jun 2018 10:53:29 -0700 (PDT)
> > Help for `rx' could use some improvement.
> 
> FWIW, I disagree.  I consider the doc string of 'rx' almost perfect,
> it's an example that people should learn from.
> 
> > 1. There seems to be no other help for `rx' than `C-h f rx'.  Nothing
> >    in the Elisp manual, for instance.  Perhaps it should have its own
> >    manual.  Or perhaps it should be documented in the Elisp manual (?).
> >    It's hard to imagine someone trying to learn the use of `rx' just by
> >    looking at `C-h f rx'.  Emacs should try to do better.
> 
> Given it's not-so-widespread use (and even outright critique of its
> very raison d'être), I see no need to describe this in the manual.  If
> and when its use becomes more widespread, we could consider that.  For
> now, it will just bloat the manual.

Perhaps its not-so-widespread use is _partly_ due to the lack
of more helpful doc?

I agree about the Elisp manual, FWIW.  I don't agree that `rx'
is adequately doc'd, at least not in terms of helping people
learn it and understand the relation between its constucts
and those of regular expressions.

To learn to use `rx' in place of regexps (or together with
regexps), the doc string is not help enough.  It's fine as a
doc string, but something more (e.g. an `rx' manual) would be
helpful.

I'm thinking, in particular, of people who are familiar
with regexps (Elisp or other) but not with `rx'.
 
> > 2. Please document (in the doc string of `rx', if nowhere else) the
> >    correspondences between each of the `rx' constructs and regexp
> >    syntax.  At least please document the most important ones.  For
> >    example, `zero-or-more' presumably corresponds to postfix regexp
> >    char `*'.
> 
> Really?  Doesn't "zero-or-more" define the effect as clearly as
> possible?  I think it does.

Perhaps you're missing the point.  Yes, `zero-or-more'
describes the effect.  No, it does not tell you which
`rx' construct corresponds to `*' in a regexp.  Again,
I'm thinking, in particular, of people who are familiar
with regexps (Elisp or other) but not with `rx'.

Documenting the correpondence explicitly, especially for
the direction regexp-construct-TO-rx-construct, would be
a step toward the ability to go back and forth easier.

Ideally, we'd have the ability to put your cursor on a
regexp in some code and hit a key to:
 * see a corresponding `rx' sexp and
 * optionally replace the regexp with the `rx' sexp.

> > 3. Please consider reordering the doc-string text to cover more
> >    commonly used and more important constructs before those less
> >    likely to be used.  E.g., `not', `and', and `or', seem more
> >    common and more important than `category'.
> 
> "Important" is in the eyes of the beholder.  I don't see why the
> current order is wrong.  If anything, it starts from "atoms" and moves
> to "expressions", which is IMO no less important than any other
> "importance" grade.

OK, forget "important".  You chose to ignore "more commonly
used".  Please consider that.

You must scan 212 lines (!) of doc string before you get to
`and' (aka `seq', aka `:', aka `sequence'), which tells you
how to write a sequence of patterns.

Again, it's not so important for a doc string, which is
essentially reference doc, not help-you-learn doc.  But
with nothing except the doc string to go on, it takes
some trudging through more rarely used stuff (I mentioned
categories) just to get to stuff that is likely to be
used often.

> Having said all that, if someone wants to work on this and thinks they
> can improve on the current state of affairs, feel free.

I certainly _hope_ people feel free to help.  I guess
you say that to make clear that you are leaving the
request open.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31698; Package emacs. (Sun, 03 Jun 2018 18:31:01 GMT) Full text and rfc822 format available.

Message #14 received at 31698 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 31698 <at> debbugs.gnu.org
Subject: Re: bug#31698: 27.0; `rx' help: Show equivalent regexp constructs
Date: Sun, 03 Jun 2018 21:30:18 +0300
> Date: Sun, 3 Jun 2018 10:53:29 -0700 (PDT)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 31698 <at> debbugs.gnu.org
> 
> > Given it's not-so-widespread use (and even outright critique of its
> > very raison d'être), I see no need to describe this in the manual.  If
> > and when its use becomes more widespread, we could consider that.  For
> > now, it will just bloat the manual.
> 
> Perhaps its not-so-widespread use is _partly_ due to the lack
> of more helpful doc?

I very much doubt that, and the recent discussion seems to concur.

> I'm thinking, in particular, of people who are familiar
> with regexps (Elisp or other) but not with `rx'.

I'm one such person, and yet I see no problem with the current
documentation.

> > > 2. Please document (in the doc string of `rx', if nowhere else) the
> > >    correspondences between each of the `rx' constructs and regexp
> > >    syntax.  At least please document the most important ones.  For
> > >    example, `zero-or-more' presumably corresponds to postfix regexp
> > >    char `*'.
> > 
> > Really?  Doesn't "zero-or-more" define the effect as clearly as
> > possible?  I think it does.
> 
> Perhaps you're missing the point.  Yes, `zero-or-more'
> describes the effect.  No, it does not tell you which
> `rx' construct corresponds to `*' in a regexp.  Again,
> I'm thinking, in particular, of people who are familiar
> with regexps (Elisp or other) but not with `rx'.

Again, I'm one such person, and it was immediately clear to me what
'zero-or-more' translates to.

> You must scan 212 lines (!) of doc string before you get to
> `and' (aka `seq', aka `:', aka `sequence'), which tells you
> how to write a sequence of patterns.

There will always be something for which you will need to scan 212
lines before you get to it.  There isn't too much one can say on a
single line, so something's gotta give.

> > Having said all that, if someone wants to work on this and thinks they
> > can improve on the current state of affairs, feel free.
> 
> I certainly _hope_ people feel free to help.  I guess
> you say that to make clear that you are leaving the
> request open.

Did you see me close it?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31698; Package emacs. (Sun, 03 Jun 2018 20:04:01 GMT) Full text and rfc822 format available.

Message #17 received at 31698 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 31698 <at> debbugs.gnu.org
Subject: RE: bug#31698: 27.0; `rx' help: Show equivalent regexp constructs
Date: Sun, 3 Jun 2018 13:02:53 -0700 (PDT)
> > Perhaps its not-so-widespread use is _partly_ due to
> > the lack of more helpful doc?
> 
> I very much doubt that, and the recent discussion seems to concur.

No, I don't think it does.  The recent discussion pointed
out other reasons why it is not more widely used.  And I'm
one of those in that discussion who explicitly agreed with
those other reasons.

And FWIW I think those reasons (verbosity etc.) are more
important than the reason given in this report.  There is,
however, nothing in that discussion that argues that the
reason given here is not relevant.

> > I'm thinking, in particular, of people who are familiar
> > with regexps (Elisp or other) but not with `rx'.
> 
> I'm one such person, and yet I see no problem with the
> current documentation.

Yes, you've made that clear.  But please "feel free" to
say it again.

I too am one such person, and guess what... 

> > > > 2. Please document (in the doc string of `rx', if nowhere else) the
> > > >    correspondences between each of the `rx' constructs and regexp
> > > >    syntax.  At least please document the most important ones.  For
> > > >    example, `zero-or-more' presumably corresponds to postfix regexp
> > > >    char `*'.
> > >
> > > Really?  Doesn't "zero-or-more" define the effect as 
> > > clearly as possible?  I think it does.
> >
> > Perhaps you're missing the point.  Yes, `zero-or-more'
> > describes the effect.  No, it does not tell you which
> > `rx' construct corresponds to `*' in a regexp.  Again,
> > I'm thinking, in particular, of people who are familiar
> > with regexps (Elisp or other) but not with `rx'.
> 
> Again, I'm one such person, and it was immediately clear
> to me what 'zero-or-more' translates to.

Again, it's about the other direction.  Not finding out
what `zero-or-more' means or translates to in a regexp,
but finding out what `*' in a regexp translates to in `rx'.

Quick, what does a shy regexp group translate to in `rx'?
(There is no correspondence, because none is needed.)

In general, it's _not obvious_ how a given regexp would
be translated to `rx'.  It would be helpful to be able
to easily translate regexps to `rx' sexps.

The doc for `rx' could help with that by providing an
explicit mapping between the two.  Do you disagree that
that would be helpful?

The mapping exists in the code, of course, but only in
the direction rx-to-regexp.  For someone new to `rx'
who wants to analyze a regexp into its `rx' constituents,
or who wants to replace a regexp by an equivalent `rx'
sexp, documenting a regexp-to-rx mapping would help.

And (as you've said more than once) "I'm one such person."

In addition, it would be good to have a Lisp function
that performs a regexp-to-rx translation.

> > You must scan 212 lines (!) of doc string before you get to
> > `and' (aka `seq', aka `:', aka `sequence'), which tells you
> > how to write a sequence of patterns.
>
> There will always be something for which you will need to scan 212
> lines before you get to it.  There isn't too much one can say on a
> single line, so something's gotta give.

That's an argument that says only that different orders
are possible.  Unless you are trying to make the even
less useful argument that the order chosen makes no
difference.

Not everything can be stated first, clearly.  Such a
truism has no relevance for choosing which order to use.
Different orders serve different purposes.

> > > Having said all that, if someone wants to work on this and thinks
> > > they can improve on the current state of affairs, feel free.
> >
> > I certainly _hope_ people feel free to help.  I guess
> > you say that to make clear that you are leaving the
> > request open.
> 
> Did you see me close it?

Did I say you closed it?




Merged 31698 36496. Request was from Mattias Engdegård <mattiase <at> acm.org> to control <at> debbugs.gnu.org. (Sun, 07 Jul 2019 10:16:01 GMT) Full text and rfc822 format available.

Removed tag(s) patch. Request was from Noam Postavsky <npostavs <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 17 Jul 2019 00:29:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 36496 <at> debbugs.gnu.org and Mattias Engdegård <mattiase <at> acm.org> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Mon, 25 Apr 2022 15:13:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 24 May 2022 11:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 85 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.