GNU bug report logs -
#36251
Regex library doesn't recognize ']' in a character class
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36251 in the body.
You can then email your comments to 36251 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guile <at> gnu.org
:
bug#36251
; Package
guile
.
(Sun, 16 Jun 2019 18:32:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Abdulrahman Semrie <hsamireh <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-guile <at> gnu.org
.
(Sun, 16 Jun 2019 18:32:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or right bracket in it. However, the string-match function doesn’t match the ‘]’ character. To demonstrate with an example, try the following funciton:
(string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)
The result for the above function should have been a match structure with Text[ab] matched. However, the string-match returns #f which is incorrect. To test if the pattern I am using was right, I tried on regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is the link that demonstrates that it works.
Hence, the above leads me to believe there is a bug in the regex library that mishandles ] character in character-classes
—
Regards,
Abdulrahman Semrie
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-guile <at> gnu.org
:
bug#36251
; Package
guile
.
(Sun, 16 Jun 2019 19:41:01 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sun, Jun 16, 2019 at 08:16:29PM +0300, Abdulrahman Semrie wrote:
>
> I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or right bracket in it. However, the string-match function doesn’t match the ‘]’ character. To demonstrate with an example, try the following funciton:
>
> (string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)
>
> The result for the above function should have been a match structure with Text[ab] matched. However, the string-match returns #f which is incorrect. To test if the pattern I am using was right, I tried on regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is the link that demonstrates that it works.
>
> Hence, the above leads me to believe there is a bug in the regex library that mishandles ] character in character-classes
If I understood you correctly, you are using POSIX regular
expressions. Within a bracket expression ([...]), you can't
escape ']' with a backslash. Just put the ] as first character,
like so:
[][a-zA-Z]
Quoting the man page (regex(7)):
A bracket expression is a list of characters enclosed in "[]".
It normally matches any single character from the list (but see
below). If the list begins with '^', it matches any single
character (but see below) not from the rest of the list. [...]
To include a literal ']' in the list, make it the first
character (following a possible '^'). To include a literal
'-', make it the first or last character, or the second endpoint
of a range [...]
See also [1], but the man page is more complete.
(I'm assuming your Guile is linked against some POSIX regex library).
Cheers
-- t
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-guile <at> gnu.org
:
bug#36251
; Package
guile
.
(Tue, 18 Jun 2019 11:11:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 36251 <at> debbugs.gnu.org (full text, mbox):
Hi,
Abdulrahman Semrie <hsamireh <at> gmail.com> writes:
> I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or
> right bracket in it. However, the string-match function doesn’t match
> the ‘]’ character. To demonstrate with an example, try the following
> funciton:
>
> (string-match "[\\[\\]a-zA-Z]+" "Text[ab]”)
>
> The result for the above function should have been a match structure
> with Text[ab] matched. However, the string-match returns #f which is
> incorrect. To test if the pattern I am using was right, I tried on
> regex101.com and it works. Here (https://regex101.com/r/VAl6aI/1) is
> the link that demonstrates that it works.
It turns out that there are several flavors of regular expressions in
common use, with different features and syntax. The link you provided
is using PCRE (PHP) regular expressions (see the "flavor" pane on the
left), and there are three other supported flavors on that web site.
Guile's (ice-9 regex) module provides a simpler flavor of regexps known
as "POSIX extended regular expressions", implemented as a thin wrapper
around your system's POSIX regular expression library ('regcomp' and
'regexec'). The web site you referenced does not appear to support
POSIX extended regular expressions, but here are some links about them:
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04
One of the notable differences is that in POSIX extended regular
expressions, character classes do not support backslash escapes, but
instead use a more ad-hoc approach as <tomas <at> tuxteam.de> described.
Regards,
Mark
Information forwarded
to
bug-guile <at> gnu.org
:
bug#36251
; Package
guile
.
(Tue, 18 Jun 2019 11:21:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 36251 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Tue, Jun 18, 2019 at 07:08:06AM -0400, Mark H Weaver wrote:
> Hi,
>
> Abdulrahman Semrie <hsamireh <at> gmail.com> writes:
>
> > I am using the pattern [\\[\\]a-zA-Z]+ to match a string with left or
> > right bracket in it [...]
> It turns out that there are several flavors of regular expressions in
> common use, with different features and syntax. The link you provided
> is using PCRE (PHP) regular expressions (see the "flavor" pane on the
> left), and there are three other supported flavors on that web site.
>
> Guile's (ice-9 regex) module provides a simpler flavor of regexps known
> as "POSIX extended regular expressions" [...]
D'oh! I forgot about Perl compatible regexps. In those, you /can/ escape
things with a backslash whithin [...]. This would have explained Abdulrhaman's
confusion better.
Thanks, Mark
-- t
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-guile <at> gnu.org
:
bug#36251
; Package
guile
.
(Fri, 28 Jun 2019 11:22:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 36251 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
> ...
> It turns out that there are several flavors of regular expressions in
> common use, with different features and syntax. The link you provided
> is using PCRE (PHP) regular expressions (see the "flavor" pane on the
> left), and there are three other supported flavors on that web site.
> ...
Fwiw, I just came across a pcre binding for guile(*), here:
https://github.com/NalaGinrut/guile-pcre-ffi
I didn't try it and I have no idea about the general quality and robustness of the
binding, last updated 4y ago it seems, but the code is really small, uses the ffi,
so it should be quite easy to patch if necessary and may be fun to 'resurrect' ...
David
(*) I found it while looking for something else, here:
http://sph.mn/foreign/guile-software.html
[Message part 2 (application/pgp-signature, inline)]
Added tag(s) notabug.
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Sun, 30 Jun 2019 19:40:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
36251 <at> debbugs.gnu.org and Abdulrahman Semrie <hsamireh <at> gmail.com>
Request was from
Ludovic Courtès <ludo <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Sun, 30 Jun 2019 19:40:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 29 Jul 2019 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 14 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.