GNU bug report logs -
#25366
26.0.50; [:blank:] character class should match all Unicode horizontal whitespace
Previous Next
Reported by: Philipp Stephani <p.stephani2 <at> gmail.com>
Date: Thu, 5 Jan 2017 13:47:02 UTC
Severity: wishlist
Tags: confirmed
Found in version 26.0.50
Done: Philipp Stephani <p.stephani2 <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Fri, 06 Jan 2017 19:21:05 +0000
with message-id <CAArVCkTbYOPOp1RB=66F96pq07_z5wwV8PQ=2cCyLS8Uk1dj-g <at> mail.gmail.com>
and subject line Re: bug#25366: 26.0.50; [:blank:] character class should match all Unicode horizontal whitespace
has caused the debbugs.gnu.org bug report #25366,
regarding 26.0.50; [:blank:] character class should match all Unicode horizontal whitespace
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
25366: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=25366
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
(string-match-p "[[:blank:]]" "\N{HAIR SPACE}")
=> nil, expected 0
[[:blank:]] should be the same as \h in PRCE.
In GNU Emacs 26.0.50.26 (x86_64-unknown-linux-gnu, GTK+ Version 3.10.8)
of 2017-01-05 built on unknown
Repository revision: d88cdad2847726438c7d1de9fd2651c4be9243aa
Windowing system distributor 'The X.Org Foundation', version 11.0.11501000
System Description: Ubuntu 14.04 LTS
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Entering debugger...
Back to top level
Configured using:
'configure --with-modules --enable-checking
--enable-check-lisp-object-type 'CFLAGS=-ggdb3 -O0''
Configured features:
XPM JPEG TIFF GIF PNG SOUND GSETTINGS NOTIFY GNUTLS FREETYPE XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message subr-x puny seq byte-opt gv
bytecomp byte-compile cl-extra cconv dired dired-loaddefs format-spec
rfc822 mml mml-sec password-cache epa derived epg epg-config gnus-util
rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils help-mode easymenu cl-loaddefs pcase
cl-lib debug time-date mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow isearch timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript case-table epa-hook jka-cmpr-hook help
simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs button
faces cus-face macroexp files text-properties overlay sha1 md5 base64
format env code-pages mule custom widget hashtable-print-readable
backquote inotify dynamic-setting system-font-setting
font-render-setting move-toolbar gtk x-toolkit x multi-tty
make-network-process emacs)
Memory information:
((conses 16 182571 10570)
(symbols 48 31257 1)
(miscs 40 340 231)
(strings 32 71112 6419)
(string-bytes 1 1678721)
(vectors 16 14561)
(vector-slots 8 529555 10250)
(floats 8 183 150)
(intervals 56 250 6)
(buffers 976 13)
(heap 1024 36602 1391))
--
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen
Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please do not
forward it, please inform the sender, and please erase this e-mail including
any attachments. Thanks.
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
Philipp Stephani <p.stephani2 <at> gmail.com> schrieb am Fr., 6. Jan. 2017 um
20:10 Uhr:
> Eli Zaretskii <eliz <at> gnu.org> schrieb am Fr., 6. Jan. 2017 um 16:11 Uhr:
>
> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > Date: Fri, 06 Jan 2017 15:00:22 +0000
> > Cc: 25366 <at> debbugs.gnu.org
> >
> >
> http://www.unicode.org/reports/tr18/tr18-19.html#Compatibility_Properties
> >
> > Patches to that effect are welcome.
> >
> > Here's a patch.
>
> Thanks. A few minor comments below.
>
> > +/* Return true if C is a horizontal whitespace character, as defined
> > + by http://www.unicode.org/reports/tr18/tr18-19.html#blank. */
> > +bool
> > +blankp (int c)
> > +{
> > + if (c == '\t')
> > + return true;
>
> Why does this test explicitly only for a TAB? What about SPC, for
> example?
>
>
> Because TAB is the only character that is blank, but doesn't have the
> general category Zs.
> I've now also included space and added a comment. The risk that the
> general category of space will ever be changed seems very small.
>
>
>
> > --- a/doc/lispref/searching.texi
> > +++ b/doc/lispref/searching.texi
> > @@ -553,7 +553,10 @@ Char Classes
> > (@pxref{Character Properties}) indicates they are alphabetic
> > characters.
> > @item [:blank:]
> > -This matches space and tab only.
> > +This matches horizontal whitespace, as defined by Unicode Technical
> > +Standard #18. In particular, it matches tabs and characters whose
> > +Unicode @samp{general-category} property (@pxref{Character
> > +Properties}) indicates they are spacing separators.
>
> Similarly here: I find the lack of reference to a space potentially
> confusing.
>
>
> Added.
>
>
>
> > +** The regular expression character class [:blank:] now matches
> > +Unicode horizontal whitespace as defined in
> > +http://www.unicode.org/reports/tr18/tr18-19.html#blank.
>
> The reference to a particular version of UTS#18 might become obsolete
> when a new version is released. So I suggest to provide a general
> reference to the report and its section, not an exact URL.
>
>
> Done.
>
Pushed to master as 512e9886be.
[Message part 5 (text/html, inline)]
This bug report was last modified 8 years and 194 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.