From unknown Sat Jun 14 14:26:32 2025
X-Loop: help-debbugs@gnu.org
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Resent-From: Philipp Stephani
Original-Sender: "Debbugs-submit"
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Tue, 22 Mar 2016 10:44:01 +0000
Resent-Message-ID:
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: report 23086
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: 23086@debbugs.gnu.org
X-Debbugs-Original-To: bug-gnu-emacs@gnu.org
Received: via spool by submit@debbugs.gnu.org id=B.145864338813150
(code B ref -1); Tue, 22 Mar 2016 10:44:01 +0000
Received: (at submit) by debbugs.gnu.org; 22 Mar 2016 10:43:08 +0000
Received: from localhost ([127.0.0.1]:57735 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from )
id 1aiJmF-0003Q1-R9
for submit@debbugs.gnu.org; Tue, 22 Mar 2016 06:43:08 -0400
Received: from eggs.gnu.org ([208.118.235.92]:47353)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from ) id 1aiJmD-0003PX-Lm
for submit@debbugs.gnu.org; Tue, 22 Mar 2016 06:43:05 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1aiJm3-0000Ts-Tt
for submit@debbugs.gnu.org; Tue, 22 Mar 2016 06:43:00 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: *
X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_50,
FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,T_DKIM_INVALID autolearn=disabled
version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:36531)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from ) id 1aiJm3-0000Ti-Q0
for submit@debbugs.gnu.org; Tue, 22 Mar 2016 06:42:55 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38288)
by lists.gnu.org with esmtp (Exim 4.71)
(envelope-from ) id 1aiJm2-00037q-6Y
for bug-gnu-emacs@gnu.org; Tue, 22 Mar 2016 06:42:55 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1aiJm0-0000T3-TO
for bug-gnu-emacs@gnu.org; Tue, 22 Mar 2016 06:42:54 -0400
Received: from mail-wm0-x233.google.com ([2a00:1450:400c:c09::233]:38624)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from ) id 1aiJm0-0000Sy-Id
for bug-gnu-emacs@gnu.org; Tue, 22 Mar 2016 06:42:52 -0400
Received: by mail-wm0-x233.google.com with SMTP id l68so157092836wml.1
for ; Tue, 22 Mar 2016 03:42:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=from:to:subject:date:message-id:mime-version
:content-transfer-encoding;
bh=8mbV3ZWkCs0ykHki2eH592MwKmbkDwFM0KGRVzBd8Zc=;
b=AXYDQ1hiOambIfh3JYeS5D5ZFeK2WIAuX8hMsyIDjWVgcSlMdErG6K1bM6mi9DkMqh
CHoYtesK04PRPGY4BRpf4NMflUGWDQKi4Wg/1N2/OYaHYmGiaT60DH3qN5129w0EFYeS
cnLTjLQY154y9N5x8fo4zFFlg72z0woGJ/PGbpEn80hUeWM2mY4Gvl0ps7hFTL4UdKyC
CTYSxBNOcmW7fPN2E6Pb9jRbmUfIk7VbYpPuO1EmeSof/ckUuhHsUkF0lX4nSxxTzsK+
tBmZu7cX/0BBwOYcU7XrYq3PV4lShnXS59VOTNsGig5cEGBPIWnI22H8uQvK3QNCkdZG
8TXQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20130820;
h=x-gm-message-state:from:to:subject:date:message-id:mime-version
:content-transfer-encoding;
bh=8mbV3ZWkCs0ykHki2eH592MwKmbkDwFM0KGRVzBd8Zc=;
b=S2IjucHiksvE0dKGDu5a46R4m/RLW2wbW79DuG8FLiLg5DSOvKitw5PYb6YyuOreuP
vJWpas4oX9bnGG3n8Vus335Q22udMLC5QdXpLUjKLRKzjYSrQ6UgO5FUn79yX6Rr/fpg
S8+5wiPpbT7IU/aJHPJ7OhJEPo1jgjZCc6ayWXeiwZEPimd6JNgzWmqBhUmLS/8vqHah
2daW+ZtNtGN020CKcU/l3cGZaxnBhFkzb4IVhz1ejiw5eYj/m2WmJEKSiFqFCEmrt0PR
KL2AV8lejddF453h0OO0rk+5ZYQVJLAy7z6CMvhpm6koCE74jmdrC8UOhbNuaYjE436E
aENQ==
X-Gm-Message-State: AD7BkJJ1v5XzI6xNVHHsa+NOsX1nVvTXTO9oMVpaBm5virGz24NjvTK0XLHehniwCQRhpw==
X-Received: by 10.28.50.138 with SMTP id y132mr20555440wmy.52.1458643371488;
Tue, 22 Mar 2016 03:42:51 -0700 (PDT)
Received: from phst2.muc.corp.google.com ([2a00:79e0:15:4:2067:167c:e8b3:7ba3])
by smtp.gmail.com with ESMTPSA id v5sm16614098wmg.16.2016.03.22.03.42.49
for
(version=TLS1_2 cipher=AES128-SHA bits=128/128);
Tue, 22 Mar 2016 03:42:50 -0700 (PDT)
From: Philipp Stephani
Date: Tue, 22 Mar 2016 11:42:46 +0100
Message-ID:
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -3.8 (---)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit"
X-Spam-Score: -3.8 (---)
Type some characters
C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR)
Type some more characters
M-q
Expected behavior: Emacs treats these characters as line and paragraph
separators: they are displayed as line breaks, M-q doesn't remove them,
and forward-paragraph etc. treat the paragraph separator as paragraph
end.
Actual behavior: These characters are displayed as one-pixel horizontal
whitespace and otherwise ignore.
Also discussed in
https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html.
https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds
support for these characters, but I think proper treatment of Unicode
separators should be part of Emacs.
In GNU Emacs 25.1.50.1 (x86_64-unknown-linux-gnu, GTK+ Version 3.10.8)
Repository revision: 780a605e1d2de4b975e6f1f29b491c9af419dcff
Windowing system distributor 'The X.Org Foundation', version 11.0.11501000
System Description: Ubuntu 14.04 LTS
Configured using:
'configure --with-modules --disable-build-details 'CFLAGS=3D-g -O0''
Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND GPM DBUS GCONF GSETTINGS NOTIFY ACL
LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES
Important settings:
value of $LANG: en_US.UTF-8
locale-coding-system: utf-8-unix
Major mode: Text
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Quit
Fill column set to 10 (was 70)
Quit
Making completion list...
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs format-spec
rfc822 mml easymenu mml-sec password-cache epa derived epg epg-config
gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045
ietf-drums mm-util mail-prsvr mail-utils iso-transl time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core term/tty-colors frame
cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai
tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian
slovak czech european ethiopic indian cyrillic chinese charscript
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)
Memory information:
((conses 16 174467 8982)
(symbols 48 30106 0)
(miscs 40 468 148)
(strings 32 66519 6641)
(string-bytes 1 1505951)
(vectors 16 13333)
(vector-slots 8 488346 23035)
(floats 8 167 91)
(intervals 56 233 2)
(buffers 976 13)
(heap 1024 43667 1138))
--=20
Google Germany GmbH
Erika-Mann-Stra=C3=9Fe 33
80636 M=C3=BCnchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Gesch=C3=A4ftsf=C3=BChrer: Matthew Scott Sucherman, Paul Terence Manicle
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
leiten Sie diese bitte nicht weiter, informieren Sie den Absender und l=C3=
=B6schen
Sie die E-Mail und alle Anh=C3=A4nge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please do =
not
forward it, please inform the sender, and please erase this e-mail including
any attachments. Thanks.
From unknown Sat Jun 14 14:26:32 2025
X-Loop: help-debbugs@gnu.org
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Resent-From: Eli Zaretskii
Original-Sender: "Debbugs-submit"
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Tue, 22 Mar 2016 16:14:01 +0000
Resent-Message-ID:
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 23086
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: Philipp Stephani
Cc: 23086@debbugs.gnu.org
Reply-To: Eli Zaretskii
Received: via spool by 23086-submit@debbugs.gnu.org id=B23086.1458663226556
(code B ref 23086); Tue, 22 Mar 2016 16:14:01 +0000
Received: (at 23086) by debbugs.gnu.org; 22 Mar 2016 16:13:46 +0000
Received: from localhost ([127.0.0.1]:60191 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from )
id 1aiOwD-00008u-JL
for submit@debbugs.gnu.org; Tue, 22 Mar 2016 12:13:45 -0400
Received: from eggs.gnu.org ([208.118.235.92]:46270)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from ) id 1aiOwB-00008h-LE
for 23086@debbugs.gnu.org; Tue, 22 Mar 2016 12:13:44 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1aiOw2-00078b-FX
for 23086@debbugs.gnu.org; Tue, 22 Mar 2016 12:13:38 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_RP_MATCHES_RCVD
autolearn=disabled version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:58336)
by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from )
id 1aiOw2-00078W-Bv; Tue, 22 Mar 2016 12:13:34 -0400
Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4094
helo=home-c4e4a596f7)
by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128)
(Exim 4.82) (envelope-from )
id 1aiOw1-0008Hb-OP; Tue, 22 Mar 2016 12:13:34 -0400
Date: Tue, 22 Mar 2016 18:13:15 +0200
Message-Id: <831t725w4k.fsf@gnu.org>
From: Eli Zaretskii
In-reply-to: (message from
Philipp Stephani on Tue, 22 Mar 2016 11:42:46 +0100)
References:
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit"
X-Spam-Score: -5.0 (-----)
> From: Philipp Stephani
> Date: Tue, 22 Mar 2016 11:42:46 +0100
>
> Type some characters
> C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR)
> Type some more characters
> M-q
>
> Expected behavior: Emacs treats these characters as line and paragraph
> separators: they are displayed as line breaks, M-q doesn't remove them,
> and forward-paragraph etc. treat the paragraph separator as paragraph
> end.
>
> Actual behavior: These characters are displayed as one-pixel horizontal
> whitespace and otherwise ignore.
>
> Also discussed in
> https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html.
> https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds
> support for these characters, but I think proper treatment of Unicode
> separators should be part of Emacs.
It is not clear to me what exactly is the requested feature. Can you
propose a detailed list of requirements?
I'm asking because these characters come in Unicode with a non-trivial
baggage, that is a far cry from just breaking the line; see
http://unicode.org/reports/tr14/
http://unicode.org/reports/tr29/
There are also implications on the bidirectional display (it is
sensitive to where the line and the paragraph begin and end).
If we want to support these two characters, we should think about
which parts of the relevant functionality we want to see in Emacs,
because users will expect that. In addition, there are other
white-space characters defined by Unicode, and it would make sense to
treat them all alike. I'm not sure it makes sense to support just the
line-breaking and paragraph-separator parts of only these two
characters.
Then there are Emacs-specific issues, for example:
. do we treat u+2028 and u+2029 as literal characters, or as a form
of EOL encoding?
. if the former, how do we distinguish them from newlines on display?
. should Isearch find these when looking for "\n"? how about regexp
search for "$"?
There are probably more implications, these just the ones that popped
in my mind in 5 sec. IOW, I think Someone⢠should think this over and
present a detailed proposal.
Thanks.
From unknown Sat Jun 14 14:26:32 2025
X-Loop: help-debbugs@gnu.org
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Resent-From: John Wiegley
Original-Sender: "Debbugs-submit"
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Sun, 27 Mar 2016 00:21:03 +0000
Resent-Message-ID:
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 23086
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: Eli Zaretskii
Cc: Philipp Stephani , 23086@debbugs.gnu.org
Received: via spool by 23086-submit@debbugs.gnu.org id=B23086.145903806121207
(code B ref 23086); Sun, 27 Mar 2016 00:21:03 +0000
Received: (at 23086) by debbugs.gnu.org; 27 Mar 2016 00:21:01 +0000
Received: from localhost ([127.0.0.1]:38998 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from )
id 1ajyRx-0005Vy-5l
for submit@debbugs.gnu.org; Sat, 26 Mar 2016 20:21:01 -0400
Received: from mail-pa0-f49.google.com ([209.85.220.49]:33742)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from ) id 1ajyRv-0005VW-7J
for 23086@debbugs.gnu.org; Sat, 26 Mar 2016 20:20:59 -0400
Received: by mail-pa0-f49.google.com with SMTP id fl4so71197861pad.0
for <23086@debbugs.gnu.org>; Sat, 26 Mar 2016 17:20:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
h=from:to:cc:subject:in-reply-to:date:message-id:references
:user-agent:mime-version:content-transfer-encoding;
bh=6xjEcuC4AoCvDZPVAveUH6TvgWzUjiMPHDBPjooomUM=;
b=0RgqCfoHAEoyVGXHYXnspoBeCo9shV0t/dCG/BNhp55EpVAg9KcTfPJLEPjoT2vpba
vx7QIdi6YQzXs4kNvA9JHPFut+ZY1/UyTggBgwIJxzHj42HBOGXiUySmIqN6MXBje0uP
VX4F3lJ7WUNsRhJ0x/CY/rowZbX9ADDG99PenDetSh6xjFfubNhbRM1ubINko0yf8cG8
E2VDqZDmV1pPZP+q2/BmHpneWwNlgg54ns8PsmcjmQd59+O/1g4IWAE4d7ZgWb4pyynK
TVQ3QFGnW945xMWBy69rynZdKGjYEZ7mfYhmjzMpuXsAdS367x8Z9OUZCshFX+tmouqw
lZsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20130820;
h=x-gm-message-state:from:to:cc:subject:in-reply-to:date:message-id
:references:user-agent:mime-version:content-transfer-encoding;
bh=6xjEcuC4AoCvDZPVAveUH6TvgWzUjiMPHDBPjooomUM=;
b=RIZVnZwAdfZlihE48T3LNFPbKCJ798ppFRUz9K3XRFpSiKIUDHJDKqD9AvFEh8II6K
+EidOCL9yAqSUwS/aYB3wMl+hLxn8ebU5YZXvd/YL8OVgTL78vlP1zc1U9q1EJRoPkQW
871DYCNQp0pP3JUaxl87/nAmtEIcgqZ+krj1ywemtgd/8ly4a3XcTHMq0Y+P7r3pPF3t
kNZ6AODEmDo3LAlOyjPgZIX8w/SgQE8/pPsKsuqP9G/CYbj4RxO9EFo5Ejys5/dBWBYR
7GuDpE2fDaJIxJgmiHrRIHH3yj6dNKZVkV3kGdgB88Ku+8ow51hrbqYdHFV4vf9ZDz4H
Jsmg==
X-Gm-Message-State: AD7BkJIJT9zduqrfvW7wInB1Nx/S0kDG4eA/vHleg5Du5NWx1MNUghrGXYmoBeyku/Om8g==
X-Received: by 10.67.8.100 with SMTP id dj4mr31875399pad.88.1459038053681;
Sat, 26 Mar 2016 17:20:53 -0700 (PDT)
Received: from Hermes.local (76-234-68-79.lightspeed.frokca.sbcglobal.net.
[76.234.68.79])
by smtp.gmail.com with ESMTPSA id 3sm25462056pfn.59.2016.03.26.17.20.51
(version=TLS1 cipher=AES128-SHA bits=128/128);
Sat, 26 Mar 2016 17:20:52 -0700 (PDT)
From: John Wiegley
X-Google-Original-From: "John Wiegley"
Received: by Hermes.local (Postfix, from userid 501)
id 102114FB4C57; Sat, 26 Mar 2016 17:20:50 -0700 (PDT)
In-Reply-To: <831t725w4k.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 22 Mar
2016 18:13:15 +0200")
Date: Sat, 26 Mar 2016 16:49:53 -0700
Message-ID:
References:
<831t725w4k.fsf@gnu.org>
User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.92 (darwin)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -0.7 (/)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit"
X-Spam-Score: -0.7 (/)
>>>>> Eli Zaretskii writes:
> There are probably more implications, these just the ones that popped in =
my
> mind in 5 sec. IOW, I think Someone=E2=84=A2 should think this over and p=
resent a
> detailed proposal.
Very much agreed. Reading this bug description gives me that "There be
dragons" feeling. :)
--=20
John Wiegley GPG fingerprint =3D 4710 CF98 AF9B 327B B80F
http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2
From unknown Sat Jun 14 14:26:32 2025
X-Loop: help-debbugs@gnu.org
Subject: bug#23086: 25.1.50; Emacs ignores Unicode line and paragraph separator characters
Resent-From: Eli Zaretskii
Original-Sender: "Debbugs-submit"
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Mon, 17 Jul 2017 15:10:02 +0000
Resent-Message-ID:
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 23086
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: p.stephani2@gmail.com
Cc: 23086@debbugs.gnu.org
Reply-To: Eli Zaretskii
Received: via spool by 23086-submit@debbugs.gnu.org id=B23086.15003041881083
(code B ref 23086); Mon, 17 Jul 2017 15:10:02 +0000
Received: (at 23086) by debbugs.gnu.org; 17 Jul 2017 15:09:48 +0000
Received: from localhost ([127.0.0.1]:44643 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from )
id 1dX7ed-0000HO-RV
for submit@debbugs.gnu.org; Mon, 17 Jul 2017 11:09:48 -0400
Received: from eggs.gnu.org ([208.118.235.92]:52225)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from ) id 1dX7ec-0000H9-Td
for 23086@debbugs.gnu.org; Mon, 17 Jul 2017 11:09:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from ) id 1dX7eU-0002SV-BD
for 23086@debbugs.gnu.org; Mon, 17 Jul 2017 11:09:41 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD
autolearn=disabled version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:35197)
by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from )
id 1dX7eU-0002SR-7B; Mon, 17 Jul 2017 11:09:38 -0400
Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4324
helo=home-c4e4a596f7)
by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
(Exim 4.82) (envelope-from )
id 1dX7eT-0006ko-LZ; Mon, 17 Jul 2017 11:09:38 -0400
Date: Mon, 17 Jul 2017 18:09:46 +0300
Message-Id: <83o9sjcd6t.fsf@gnu.org>
From: Eli Zaretskii
In-reply-to: <831t725w4k.fsf@gnu.org> (message from Eli Zaretskii on Tue, 22
Mar 2016 18:13:15 +0200)
References:
<831t725w4k.fsf@gnu.org>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -5.0 (-----)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit"
X-Spam-Score: -5.0 (-----)
> Date: Tue, 22 Mar 2016 18:13:15 +0200
> From: Eli Zaretskii
> Cc: 23086@debbugs.gnu.org
>
> > From: Philipp Stephani
> > Date: Tue, 22 Mar 2016 11:42:46 +0100
> >
> > Type some characters
> > C-x 8 RET LINE SEPARATOR (or PARAGRAPH SEPARATOR)
> > Type some more characters
> > M-q
> >
> > Expected behavior: Emacs treats these characters as line and paragraph
> > separators: they are displayed as line breaks, M-q doesn't remove them,
> > and forward-paragraph etc. treat the paragraph separator as paragraph
> > end.
> >
> > Actual behavior: These characters are displayed as one-pixel horizontal
> > whitespace and otherwise ignore.
> >
> > Also discussed in
> > https://lists.gnu.org/archive/html/emacs-devel/2015-08/msg01043.html.
> > https://www.emacswiki.org/emacs/unicode-whitespace.el supposedly adds
> > support for these characters, but I think proper treatment of Unicode
> > separators should be part of Emacs.
>
> It is not clear to me what exactly is the requested feature. Can you
> propose a detailed list of requirements?
>
> I'm asking because these characters come in Unicode with a non-trivial
> baggage, that is a far cry from just breaking the line; see
>
> http://unicode.org/reports/tr14/
> http://unicode.org/reports/tr29/
>
> There are also implications on the bidirectional display (it is
> sensitive to where the line and the paragraph begin and end).
>
> If we want to support these two characters, we should think about
> which parts of the relevant functionality we want to see in Emacs,
> because users will expect that. In addition, there are other
> white-space characters defined by Unicode, and it would make sense to
> treat them all alike. I'm not sure it makes sense to support just the
> line-breaking and paragraph-separator parts of only these two
> characters.
>
> Then there are Emacs-specific issues, for example:
>
> . do we treat u+2028 and u+2029 as literal characters, or as a form
> of EOL encoding?
> . if the former, how do we distinguish them from newlines on display?
> . should Isearch find these when looking for "\n"? how about regexp
> search for "$"?
>
> There are probably more implications, these just the ones that popped
> in my mind in 5 sec. IOW, I think Someone⢠should think this over and
> present a detailed proposal.
So I've dusted off this year-old bug reported and decided to improve
Emacs in this area. Here's what I propose:
. u+2028 and u+2029 (and also perhaps u+0085) will be treated a form
of EOL encoding, which means they will not appear on display, and
will cause the next character be displayed on the next screen line
. M-q will remove u+2028, as it removes newlines, and put newlines
at all EOLs as part of filling
. M-q will NOT remove u+2029, unless the user wants to refill several
paragraphs as a single paragraph, and there happens to be a u+2029
between some of the paragraphs
. forward-paragraph etc. will treat u+2029 as paragraph end
. bidi reordering will treat u+2029 as paragraph end
There are some compromises in these decisions, but they make the job
much easier and less intrusive, and I think they will advance the
level of our Unicode support quite a bit.
Comments?
I think we should also make $ match these two characters, in addition
to the newline, but that could be more difficult. Would someone who
knows their way in regex.c want to work on this part?