GNU bug report logs -
#25366
26.0.50; [:blank:] character class should match all Unicode horizontal whitespace
Previous Next
Reported by: Philipp Stephani <p.stephani2 <at> gmail.com>
Date: Thu, 5 Jan 2017 13:47:02 UTC
Severity: wishlist
Tags: confirmed
Found in version 26.0.50
Done: Philipp Stephani <p.stephani2 <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 25366 in the body.
You can then email your comments to 25366 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25366
; Package
emacs
.
(Thu, 05 Jan 2017 13:47:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Philipp Stephani <p.stephani2 <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 05 Jan 2017 13:47:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
(string-match-p "[[:blank:]]" "\N{HAIR SPACE}")
=> nil, expected 0
[[:blank:]] should be the same as \h in PRCE.
In GNU Emacs 26.0.50.26 (x86_64-unknown-linux-gnu, GTK+ Version 3.10.8)
of 2017-01-05 built on unknown
Repository revision: d88cdad2847726438c7d1de9fd2651c4be9243aa
Windowing system distributor 'The X.Org Foundation', version 11.0.11501000
System Description: Ubuntu 14.04 LTS
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Entering debugger...
Back to top level
Configured using:
'configure --with-modules --enable-checking
--enable-check-lisp-object-type 'CFLAGS=-ggdb3 -O0''
Configured features:
XPM JPEG TIFF GIF PNG SOUND GSETTINGS NOTIFY GNUTLS FREETYPE XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message subr-x puny seq byte-opt gv
bytecomp byte-compile cl-extra cconv dired dired-loaddefs format-spec
rfc822 mml mml-sec password-cache epa derived epg epg-config gnus-util
rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils help-mode easymenu cl-loaddefs pcase
cl-lib debug time-date mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow isearch timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript case-table epa-hook jka-cmpr-hook help
simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs button
faces cus-face macroexp files text-properties overlay sha1 md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote inotify dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)
Memory information:
((conses 16 182571 10570)
(symbols 48 31257 1)
(miscs 40 340 231)
(strings 32 71112 6419)
(string-bytes 1 1678721)
(vectors 16 14561)
(vector-slots 8 529555 10250)
(floats 8 183 150)
(intervals 56 250 6)
(buffers 976 13)
(heap 1024 36602 1391))
--
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen
Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please do not
forward it, please inform the sender, and please erase this e-mail including
any attachments. Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25366
; Package
emacs
.
(Thu, 05 Jan 2017 15:51:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 25366 <at> debbugs.gnu.org (full text, mbox):
> From: Philipp Stephani <p.stephani2 <at> gmail.com>
> Date: Thu, 05 Jan 2017 14:46:01 +0100
>
> (string-match-p "[[:blank:]]" "\N{HAIR SPACE}")
> => nil, expected 0
>
> [[:blank:]] should be the same as \h in PRCE.
We are consistent with our documentation, but I agree that it would be
good to extend [:blank:], as proposed here:
http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properties
Patches to that effect are welcome.
Severity set to 'wishlist' from 'normal'
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Thu, 05 Jan 2017 23:08:01 GMT)
Full text and
rfc822 format available.
Added tag(s) confirmed.
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Thu, 05 Jan 2017 23:08:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25366
; Package
emacs
.
(Fri, 06 Jan 2017 15:01:02 GMT)
Full text and
rfc822 format available.
Message #15 received at 25366 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> schrieb am Do., 5. Jan. 2017 um 16:50 Uhr:
> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > Date: Thu, 05 Jan 2017 14:46:01 +0100
> >
> > (string-match-p "[[:blank:]]" "\N{HAIR SPACE}")
> > => nil, expected 0
> >
> > [[:blank:]] should be the same as \h in PRCE.
>
> We are consistent with our documentation, but I agree that it would be
> good to extend [:blank:], as proposed here:
>
>
> http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properties
>
> Patches to that effect are welcome.
>
Here's a patch.
[Message part 2 (text/html, inline)]
[0001-Add-support-for-Unicode-whitespace-in-blank.txt (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25366
; Package
emacs
.
(Fri, 06 Jan 2017 15:12:01 GMT)
Full text and
rfc822 format available.
Message #18 received at 25366 <at> debbugs.gnu.org (full text, mbox):
> From: Philipp Stephani <p.stephani2 <at> gmail.com>
> Date: Fri, 06 Jan 2017 15:00:22 +0000
> Cc: 25366 <at> debbugs.gnu.org
>
> http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properties
>
> Patches to that effect are welcome.
>
> Here's a patch.
Thanks. A few minor comments below.
> +/* Return true if C is a horizontal whitespace character, as defined
> + by http://www.unicode.org/reports/tr18/tr18-19.html#blank. */
> +bool
> +blankp (int c)
> +{
> + if (c == '\t')
> + return true;
Why does this test explicitly only for a TAB? What about SPC, for
example?
> --- a/doc/lispref/searching.texi
> +++ b/doc/lispref/searching.texi
> @@ -553,7 +553,10 @@ Char Classes
> (@pxref{Character Properties}) indicates they are alphabetic
> characters.
> @item [:blank:]
> -This matches space and tab only.
> +This matches horizontal whitespace, as defined by Unicode Technical
> +Standard #18. In particular, it matches tabs and characters whose
> +Unicode @samp{general-category} property (@pxref{Character
> +Properties}) indicates they are spacing separators.
Similarly here: I find the lack of reference to a space potentially
confusing.
> +** The regular expression character class [:blank:] now matches
> +Unicode horizontal whitespace as defined in
> +http://www.unicode.org/reports/tr18/tr18-19.html#blank.
The reference to a particular version of UTS#18 might become obsolete
when a new version is released. So I suggest to provide a general
reference to the report and its section, not an exact URL.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#25366
; Package
emacs
.
(Fri, 06 Jan 2017 19:12:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 25366 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> schrieb am Fr., 6. Jan. 2017 um 16:11 Uhr:
> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > Date: Fri, 06 Jan 2017 15:00:22 +0000
> > Cc: 25366 <at> debbugs.gnu.org
> >
> >
> http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properties
> >
> > Patches to that effect are welcome.
> >
> > Here's a patch.
>
> Thanks. A few minor comments below.
>
> > +/* Return true if C is a horizontal whitespace character, as defined
> > + by http://www.unicode.org/reports/tr18/tr18-19.html#blank. */
> > +bool
> > +blankp (int c)
> > +{
> > + if (c == '\t')
> > + return true;
>
> Why does this test explicitly only for a TAB? What about SPC, for
> example?
>
Because TAB is the only character that is blank, but doesn't have the
general category Zs.
I've now also included space and added a comment. The risk that the general
category of space will ever be changed seems very small.
>
> > --- a/doc/lispref/searching.texi
> > +++ b/doc/lispref/searching.texi
> > @@ -553,7 +553,10 @@ Char Classes
> > (@pxref{Character Properties}) indicates they are alphabetic
> > characters.
> > @item [:blank:]
> > -This matches space and tab only.
> > +This matches horizontal whitespace, as defined by Unicode Technical
> > +Standard #18. In particular, it matches tabs and characters whose
> > +Unicode @samp{general-category} property (@pxref{Character
> > +Properties}) indicates they are spacing separators.
>
> Similarly here: I find the lack of reference to a space potentially
> confusing.
>
Added.
>
> > +** The regular expression character class [:blank:] now matches
> > +Unicode horizontal whitespace as defined in
> > +http://www.unicode.org/reports/tr18/tr18-19.html#blank.
>
> The reference to a particular version of UTS#18 might become obsolete
> when a new version is released. So I suggest to provide a general
> reference to the report and its section, not an exact URL.
>
Done.
[Message part 2 (text/html, inline)]
Reply sent
to
Philipp Stephani <p.stephani2 <at> gmail.com>
:
You have taken responsibility.
(Fri, 06 Jan 2017 19:22:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Philipp Stephani <p.stephani2 <at> gmail.com>
:
bug acknowledged by developer.
(Fri, 06 Jan 2017 19:22:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 25366-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Philipp Stephani <p.stephani2 <at> gmail.com> schrieb am Fr., 6. Jan. 2017 um
20:10 Uhr:
> Eli Zaretskii <eliz <at> gnu.org> schrieb am Fr., 6. Jan. 2017 um 16:11 Uhr:
>
> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > Date: Fri, 06 Jan 2017 15:00:22 +0000
> > Cc: 25366 <at> debbugs.gnu.org
> >
> >
> http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properties
> >
> > Patches to that effect are welcome.
> >
> > Here's a patch.
>
> Thanks. A few minor comments below.
>
> > +/* Return true if C is a horizontal whitespace character, as defined
> > + by http://www.unicode.org/reports/tr18/tr18-19.html#blank. */
> > +bool
> > +blankp (int c)
> > +{
> > + if (c == '\t')
> > + return true;
>
> Why does this test explicitly only for a TAB? What about SPC, for
> example?
>
>
> Because TAB is the only character that is blank, but doesn't have the
> general category Zs.
> I've now also included space and added a comment. The risk that the
> general category of space will ever be changed seems very small.
>
>
>
> > --- a/doc/lispref/searching.texi
> > +++ b/doc/lispref/searching.texi
> > @@ -553,7 +553,10 @@ Char Classes
> > (@pxref{Character Properties}) indicates they are alphabetic
> > characters.
> > @item [:blank:]
> > -This matches space and tab only.
> > +This matches horizontal whitespace, as defined by Unicode Technical
> > +Standard #18. In particular, it matches tabs and characters whose
> > +Unicode @samp{general-category} property (@pxref{Character
> > +Properties}) indicates they are spacing separators.
>
> Similarly here: I find the lack of reference to a space potentially
> confusing.
>
>
> Added.
>
>
>
> > +** The regular expression character class [:blank:] now matches
> > +Unicode horizontal whitespace as defined in
> > +http://www.unicode.org/reports/tr18/tr18-19.html#blank.
>
> The reference to a particular version of UTS#18 might become obsolete
> when a new version is released. So I suggest to provide a general
> reference to the report and its section, not an exact URL.
>
>
> Done.
>
Pushed to master as 512e9886be.
[Message part 2 (text/html, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 04 Feb 2017 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 8 years and 193 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.