GNU bug report logs - #25685
fill-paragraph vs. \n vs. Chinese / English boundaries

Previous Next

Package: emacs;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Sat, 11 Feb 2017 00:23:01 UTC

Severity: minor

Tags: notabug

Merged with 25099

Done: Katsumi Yamaoka <yamaoka <at> jpl.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 25685 in the body.
You can then email your comments to 25685 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#25685; Package emacs. (Sat, 11 Feb 2017 00:23:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 11 Feb 2017 00:23:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: bug-gnu-emacs <bug-gnu-emacs <at> gnu.org>
Cc: Katsumi Yamaoka <yamaoka <at> jpl.org>
Subject: fill-paragraph vs. \n vs. Chinese / English boundaries
Date: Sat, 11 Feb 2017 08:22:19 +0800
Big problem!
Put the cursor upon the following paragraph,

We proceeded to dig up 些 some Canna edulis Ker 食用美人蕉
when surprise surprise...

and do

M-h (translated from <escape> h) runs the command mark-paragraph
(found in global-map), which is an interactive compiled Lisp function
in ‘paragraphs.el’.

then

M-q (translated from <escape> q) runs the command fill-paragraph
(found in global-map), which is an interactive compiled Lisp function
in ‘fill.el’.

It becomes

We proceeded to dig up 些 some Canna edulis Ker 食用美人蕉when
surprise surprise...

I.e., the "食用美人蕉when" are now stuck together, making our text
look very unprofessional.

You might argue that that's the way the ball bounces. But then I would
say what about "up 些 some", you didn't glue that together, and on
only one side (when removing "\n") either.

So what do I want? This:

We proceeded to dig up 些 some Canna edulis Ker 食用美人蕉 when
surprise surprise...




Merged 25099 25685. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 11 Feb 2017 02:39:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#25685; Package emacs. (Mon, 13 Feb 2017 00:23:02 GMT) Full text and rfc822 format available.

Message #10 received at 25685 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 25685 <at> debbugs.gnu.org
Subject: Re: bug#25685: fill-paragraph vs. \n vs. Chinese / English boundaries
Date: Mon, 13 Feb 2017 09:22:46 +0900
[Message part 1 (text/plain, inline)]
On Sat, 11 Feb 2017 08:22:19 +0800, 積丹尼 Dan Jacobson wrote:
> We proceeded to dig up 些 some Canna edulis Ker 食用美人蕉
> when surprise surprise...

> and do  M-h  then  M-q
> It becomes

> We proceeded to dig up 些 some Canna edulis Ker 食用美人蕉when
> surprise surprise...

> I.e., the "食用美人蕉when" are now stuck together, making our text
> look very unprofessional.

Fixing such things one by one manually is my routine, too.  But
I first tried:

* lisp/textmodes/fill.el (fill-delete-newlines):
Don't delete leading and trailing space from CJK word.

I'm not sure if it is the right solution for every case, though.

[Message part 2 (text/x-patch, inline)]
--- fill.el~	2017-01-04 22:17:04.000000000 +0000
+++ fill.el	2017-02-12 23:57:42.946118200 +0000
@@ -494,8 +494,8 @@
 	    (replace-match (get-text-property (match-beginning 0) 'fill-space))
 	  (let ((prev (char-before (match-beginning 0)))
 		(next (following-char)))
-	    (if (and (or (aref (char-category-set next) ?|)
-			 (aref (char-category-set prev) ?|))
+	    (if (and (aref (char-category-set next) ?|)
+		     (aref (char-category-set prev) ?|)
 		     (or (aref fill-nospace-between-words-table next)
 			 (aref fill-nospace-between-words-table prev)))
 		(delete-char -1))))))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#25685; Package emacs. (Mon, 13 Feb 2017 23:14:01 GMT) Full text and rfc822 format available.

Message #13 received at 25685 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 25685 <at> debbugs.gnu.org
Subject: Re: bug#25685: fill-paragraph vs. \n vs. Chinese / English boundaries
Date: Tue, 14 Feb 2017 08:13:19 +0900
On Mon, 13 Feb 2017 09:22:46 +0900, Katsumi Yamaoka wrote:
> I first tried:

> * lisp/textmodes/fill.el (fill-delete-newlines):
> Don't delete leading and trailing space from CJK word.

In Japan some people write an English word in Japanese text with
spaces, and others do it with no space:

日本語と English の混在
日本語とEnglishの混在

For the later case, some word processor separates them with thin
spaces automatically, but Emacs doesn't (I'm not wrong, am I?).
So, I am the former.  Anyway, if the patch is applied, it would
embarrass the later people, i.e.:

寿限無寿限無五劫の擦り切れ海砂利水魚の水行末雲来末風来末
eating寝るplaceに住む処やぶら小路の藪柑子パイポパイポ...
↓
寿限無寿限無五劫の擦り切れ海砂利水魚の水行末雲来末風来末 eating寝る
placeに住む処やぶら小路の藪柑子パイポパイポ...

So, there may want to be a user option, of which the default value
doesn't change the present behavior.  Is there a good name for it?

fill-multilinguala-words-with-spaces
                  text?
fill-separate-words-with-spaces-in-multilingual-text
fill-words-with-spaces-in-multilingual-text

↓describe its meaning in doc string
fill-words-with-spaces
fill-paragraph-with-spaces ← not necessarily used for paragraph?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#25685; Package emacs. (Wed, 15 Feb 2017 01:33:02 GMT) Full text and rfc822 format available.

Message #16 received at 25685 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 25685 <at> debbugs.gnu.org
Subject: Re: bug#25685: fill-paragraph vs. \n vs. Chinese / English boundaries
Date: Wed, 15 Feb 2017 10:32:01 +0900
On Tue, 14 Feb 2017 08:13:19 +0900, Katsumi Yamaoka wrote:
> So, there may want to be a user option, of which the default value
> doesn't change the present behavior.  Is there a good name for it?

I've installed it in the Emacs master
<http://lists.gnu.org/archive/html/emacs-diffs/2017-02/msg00185.html>
as:
,----[ C-h v fill-separate-heterogeneous-words-with-space RET ]
| fill-separate-heterogeneous-words-with-space is a variable defined in ‘fill.el’.
| Its value is t
| Original value was nil
| 
| Documentation:
| Non-nil means that use a space to separate words of different kind.
| This will be done with a word in the end of a line and a word in the
| beginning of the next line when concatenating them for filling those
| lines.  Whether to use a space is up to how the words are categorized.
| 
| You can customize this variable.
| 
| This variable was introduced, or its default value was changed, in
| version 26.1 of Emacs.
| 
| [back]
`----




bug closed, send any further explanations to 25685 <at> debbugs.gnu.org and 積丹尼 Dan Jacobson <jidanni <at> jidanni.org> Request was from Katsumi Yamaoka <yamaoka <at> jpl.org> to control <at> debbugs.gnu.org. (Wed, 15 Feb 2017 22:13:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 16 Mar 2017 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 97 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.