GNU bug report logs - #25366
26.0.50; [:blank:] character class should match all Unicode horizontal whitespace

Previous Next

Package: emacs;

Reported by: Philipp Stephani <p.stephani2 <at> gmail.com>

Date: Thu, 5 Jan 2017 13:47:02 UTC

Severity: wishlist

Tags: confirmed

Found in version 26.0.50

Done: Philipp Stephani <p.stephani2 <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 25366 <at> debbugs.gnu.org
Subject: bug#25366: 26.0.50; [:blank:] character class should match all Unicode horizontal whitespace
Date: Fri, 06 Jan 2017 19:10:57 +0000
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> schrieb am Fr., 6. Jan. 2017 um 16:11 Uhr:

> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > Date: Fri, 06 Jan 2017 15:00:22 +0000
> > Cc: 25366 <at> debbugs.gnu.org
> >
> >
> http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properties
> >
> >  Patches to that effect are welcome.
> >
> > Here's a patch.
>
> Thanks.  A few minor comments below.
>
> > +/* Return true if C is a horizontal whitespace character, as defined
> > +   by http://www.unicode.org/reports/tr18/tr18-19.html#blank.  */
> > +bool
> > +blankp (int c)
> > +{
> > +  if (c == '\t')
> > +    return true;
>
> Why does this test explicitly only for a TAB?  What about SPC, for
> example?
>

Because TAB is the only character that is blank, but doesn't have the
general category Zs.
I've now also included space and added a comment. The risk that the general
category of space will ever be changed seems very small.


>
> > --- a/doc/lispref/searching.texi
> > +++ b/doc/lispref/searching.texi
> > @@ -553,7 +553,10 @@ Char Classes
> >  (@pxref{Character Properties}) indicates they are alphabetic
> >  characters.
> >  @item [:blank:]
> > -This matches space and tab only.
> > +This matches horizontal whitespace, as defined by Unicode Technical
> > +Standard #18.  In particular, it matches tabs and characters whose
> > +Unicode @samp{general-category} property (@pxref{Character
> > +Properties}) indicates they are spacing separators.
>
> Similarly here: I find the lack of reference to a space potentially
> confusing.
>

Added.


>
> > +** The regular expression character class [:blank:] now matches
> > +Unicode horizontal whitespace as defined in
> > +http://www.unicode.org/reports/tr18/tr18-19.html#blank.
>
> The reference to a particular version of UTS#18 might become obsolete
> when a new version is released.  So I suggest to provide a general
> reference to the report and its section, not an exact URL.
>

Done.
[Message part 2 (text/html, inline)]

This bug report was last modified 8 years and 193 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.