GNU bug report logs - #22429
Force character to be recognized as LTR inside RTL paragraph

Previous Next

Package: emacs;

Reported by: "Filipe Moreira" <famoreira <at> gmail.com>

Date: Thu, 21 Jan 2016 21:15:02 UTC

Severity: normal

Tags: notabug

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22429 in the body.
You can then email your comments to 22429 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#22429; Package emacs. (Thu, 21 Jan 2016 21:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Filipe Moreira" <famoreira <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 21 Jan 2016 21:15:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Filipe Moreira" <famoreira <at> gmail.com>
To: <bug-gnu-emacs <at> gnu.org>
Subject: Force character to be recognized as LTR inside RTL paragraph
Date: Thu, 21 Jan 2016 13:14:22 -0800
[Message part 1 (text/plain, inline)]
Hi everyone,

I’m using Emacs as a LaTeX editor, with the AUCTeX mode. One document I’m authoring is written in English with some paragraphs in Hebrew or Greek. 

The issue I have is with mixing some neutral characters that need to be LTR, inside a paragraph which is RTL. An example of this is the slash (i.e. ‘\’) character used by LaTeX to signal its commands. Inside a RTL paragraph I ideally want to force Emacs to always interpret the slash character, as well as the open and close brackets (i.e. {}) as LTR. 

This is not what happens at the moment. Here I have a visual representation of the problem: 
http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex
.

Is it possible to whitelist some characters that should always be interpreted as LTR?

Thanks

Filipe Moreira

-- 

Freelance Web Developer(Ruby & Javascript)

http://coderelax.com/
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22429; Package emacs. (Fri, 22 Jan 2016 08:08:01 GMT) Full text and rfc822 format available.

Message #8 received at 22429 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Filipe Moreira" <famoreira <at> gmail.com>
Cc: 22429 <at> debbugs.gnu.org
Subject: Re: bug#22429: Force character to be recognized as LTR inside RTL
 paragraph
Date: Fri, 22 Jan 2016 10:08:06 +0200
> Date: Thu, 21 Jan 2016 13:14:22 -0800
> From: "Filipe Moreira" <famoreira <at> gmail.com>
> 
> I’m using Emacs as a LaTeX editor, with the AUCTeX mode. One document I’m
> authoring is written in English with some paragraphs in Hebrew or Greek. 
> 
> The issue I have is with mixing some neutral characters that need to be LTR,
> inside a paragraph which is RTL. An example of this is the slash (i.e. ‘\’)
> character used by LaTeX to signal its commands. Inside a RTL paragraph I
> ideally want to force Emacs to always interpret the slash character, as well as
> the open and close brackets (i.e. {}) as LTR. 
> 
> This is not what happens at the moment. Here I have a visual representation of
> the problem:
> http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex.
> 
> Is it possible to whitelist some characters that should always be interpreted
> as LTR?

The directionality of characters is determined by their bidirectional
class property as defined by the Unicode Character Database.  Emacs
uses those definitions in its implementation of the UBA, the Unicode
Bidirectional Algorithm, when it lays out text for display.
Punctuation characters, such as \, {, and } have "weak
directionality": they take the directionality of the surrounding text,
and if the directionality on either side is different, they default to
the paragraph's base direction, which is RTL in your case.  So that is
what you see.

Emacs being Emacs, you can programmatically change the bidirectional
class of every character, but that change has global effect: it will
affect the directionality of that character everywhere in the Emacs
session.  So this is not recommended.

The correct solution to these problems is to wrap the footnote block
in the LRE..PDF or LRI..PDI control characters, so that the footnote
is rendered independently of the surrounding bidirectional context.
See the example below.  Not sure if LaTeX will DTRT with directional
control characters, but if it doesn't, that's a bug/misfeature in
LaTeX.

\begin{hebrew}
  \pstart

בְּרֵאשִׁ֖ית‪\footnoteA{This is a Hebrew related footnote}‬ בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

  \pend
\end{hebrew}

Another possibility is to insert newlines between the footnote and the
surrounding text, as shown below.  Not sure if LaTeX will be happy
with that, and I think it's uglier anyway.

\begin{hebrew}
  \pstart

בְּרֵאשִׁ֖ית

\footnoteA{This is a Hebrew related footnote}

בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

  \pend
\end{hebrew}

I don't think there's a bug to fix here, so I'm going to close this
bug report.  Any objections?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22429; Package emacs. (Fri, 22 Jan 2016 08:25:02 GMT) Full text and rfc822 format available.

Message #11 received at 22429 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: famoreira <at> gmail.com
Cc: 22429 <at> debbugs.gnu.org
Subject: Re: bug#22429: Force character to be recognized as LTR inside RTL
 paragraph
Date: Fri, 22 Jan 2016 10:24:14 +0200
> Date: Fri, 22 Jan 2016 10:08:06 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 22429 <at> debbugs.gnu.org
> 
> The correct solution to these problems is to wrap the footnote block
> in the LRE..PDF or LRI..PDI control characters, so that the footnote
> is rendered independently of the surrounding bidirectional context.

Actually, LRM should also work, you just need to put it on both sides
of the footnote, like below:

\begin{hebrew}
  \pstart

בְּרֵאשִׁ֖ית‎\footnoteA{This is a Hebrew related footnote}‎ בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

  \pend
\end{hebrew}




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22429; Package emacs. (Fri, 22 Jan 2016 09:33:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#22429: Force character to be recognized as LTR inside RTL
 paragraph
Date: Fri, 22 Jan 2016 09:31:39 +0000
On Fri 22 Jan 2016, Eli Zaretskii wrote:

>> Date: Fri, 22 Jan 2016 10:08:06 +0200
>> From: Eli Zaretskii <eliz <at> gnu.org>
>> Cc: 22429 <at> debbugs.gnu.org
>> 
>> The correct solution to these problems is to wrap the footnote block
>> in the LRE..PDF or LRI..PDI control characters, so that the footnote
>> is rendered independently of the surrounding bidirectional context.
>
> Actually, LRM should also work, you just need to put it on both sides
> of the footnote, like below:
>
> \begin{hebrew}
>   \pstart
>
> בְּרֵאשִׁ֖ית‎\footnoteA{This is a Hebrew related footnote}‎ בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
>
>   \pend
> \end{hebrew}

While reading this message, I noticed odd behaviour of cursor motion
with <right> and <left> (i.e. right-char and left-char). 

I would expect repeated <right> to move in logical order until the end
of the buffer, but it gets stuck on the newline after "\pstart".
Likewise repeated <left> from the end gets stuck at the newline before
"\pend".

Saving this text in a file "foo.txt" showed the same behaviour (using the
latest emacs-25 branch with "emacs -Q"). Is this expected ?

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22429; Package emacs. (Fri, 22 Jan 2016 11:56:02 GMT) Full text and rfc822 format available.

Message #17 received at 22429 <at> debbugs.gnu.org (full text, mbox):

From: Filipe Moreira <famoreira <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 22429 <at> debbugs.gnu.org
Subject: Re: bug#22429: Force character to be recognized as LTR inside RTL
 paragraph
Date: Fri, 22 Jan 2016 11:54:45 +0000
[Message part 1 (text/plain, inline)]
Hi Eli,

Thank for taking the time to look into this

On Fri, Jan 22, 2016 at 8:08 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:

> > Date: Thu, 21 Jan 2016 13:14:22 -0800
> > From: "Filipe Moreira" <famoreira <at> gmail.com>
> >
> > I’m using Emacs as a LaTeX editor, with the AUCTeX mode. One document I’m
> > authoring is written in English with some paragraphs in Hebrew or Greek.
> >
> > The issue I have is with mixing some neutral characters that need to be
> LTR,
> > inside a paragraph which is RTL. An example of this is the slash (i.e.
> ‘\’)
> > character used by LaTeX to signal its commands. Inside a RTL paragraph I
> > ideally want to force Emacs to always interpret the slash character, as
> well as
> > the open and close brackets (i.e. {}) as LTR.
> >
> > This is not what happens at the moment. Here I have a visual
> representation of
> > the problem:
> >
> http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex
> .
> >
> > Is it possible to whitelist some characters that should always be
> interpreted
> > as LTR?
>
> The directionality of characters is determined by their bidirectional
> class property as defined by the Unicode Character Database.  Emacs
> uses those definitions in its implementation of the UBA, the Unicode
> Bidirectional Algorithm, when it lays out text for display.
> Punctuation characters, such as \, {, and } have "weak
> directionality": they take the directionality of the surrounding text,
> and if the directionality on either side is different, they default to
> the paragraph's base direction, which is RTL in your case.  So that is
> what you see.
>
> Emacs being Emacs, you can programmatically change the bidirectional
> class of every character, but that change has global effect: it will
> affect the directionality of that character everywhere in the Emacs
> session.  So this is not recommended.
>

Also this is not recommended, I would be willing to have the bidi class
property of some characters set to left-to-right, like the example of the
slash character. Can you point somewhere regarding this? I saw the
get-char-code-property function but could not find anyway to actually
change the setting.


>
> The correct solution to these problems is to wrap the footnote block
> in the LRE..PDF or LRI..PDI control characters, so that the footnote
> is rendered independently of the surrounding bidirectional context.
> See the example below.  Not sure if LaTeX will DTRT with directional
> control characters, but if it doesn't, that's a bug/misfeature in
> LaTeX.
>
> \begin{hebrew}
>   \pstart
>
> בְּרֵאשִׁ֖ית‪\footnoteA{This is a Hebrew related footnote}‬ בָּרָ֣א
> אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
>
>   \pend
> \end{hebrew}
>

In this example the direction of the surrounding Hebrew text has been
changed. The word בְּרֵאשִׁ֖ית should come before (i.e. on the right) of
the word בָּרָ֣א. So while the footnote command is correctly shown as LTR
the Hebrew text has been changed. I don't think is is the expected. See the
updated image (
http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex)
that shows TextEdit correct handling of this.


>
> Another possibility is to insert newlines between the footnote and the
> surrounding text, as shown below.  Not sure if LaTeX will be happy
> with that, and I think it's uglier anyway.
>
> \begin{hebrew}
>   \pstart
>
> בְּרֵאשִׁ֖ית
>
> \footnoteA{This is a Hebrew related footnote}
>
> בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
>
>   \pend
> \end{hebrew}
>

Unfortunately for my use case this is not possible.

>
> I don't think there's a bug to fix here, so I'm going to close this
> bug report.  Any objections?
>

Is there any change of having a way to set the unicode bidirectionally of
 a character within each separate mode? Could this be considered a feature?
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22429; Package emacs. (Fri, 22 Jan 2016 14:02:01 GMT) Full text and rfc822 format available.

Message #20 received at 22429 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Filipe Moreira <famoreira <at> gmail.com>
Cc: 22429 <at> debbugs.gnu.org
Subject: Re: bug#22429: Force character to be recognized as LTR inside RTL
 paragraph
Date: Fri, 22 Jan 2016 16:01:07 +0200
> From: Filipe Moreira <famoreira <at> gmail.com>
> Date: Fri, 22 Jan 2016 11:54:45 +0000
> Cc: 22429 <at> debbugs.gnu.org
> 
>     Emacs being Emacs, you can programmatically change the bidirectional
>     class of every character, but that change has global effect: it will
>     affect the directionality of that character everywhere in the Emacs
>     session. So this is not recommended.
> 
> Also this is not recommended, I would be willing to have the bidi class
> property of some characters set to left-to-right, like the example of the slash
> character.

Can you tell why?  There are ways to produce the display you expect
without changing the character properties; I described 3 such ways.
If you change the properties, the text will only display correctly on
your system, any other user who displays your text, either in Emacs or
in other editor that supports bidirectional display, will see the text
in the same jumbled order you wanted to avoid.  So I see very little
sense in such changes.

> Can you point somewhere regarding this? I saw the
> get-char-code-property function but could not find anyway to
> actually change the setting.

You want put-char-code-property.  Again, I very much recommend not to
do that.

>     \begin{hebrew}
>     \pstart
>     
>     בְּרֵאשִׁ֖ית‪\footnoteA{This is a Hebrew related footnote}‬ בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת
>     הָאָֽרֶץ׃
>     
>     \pend
>     \end{hebrew}
>     
> 
> In this example the direction of the surrounding Hebrew text has been changed.
> The word בְּרֵאשִׁ֖ית should come before (i.e. on the right) of the word בָּרָ֣א. So
> while the footnote command is correctly shown as LTR the Hebrew text has been
> changed. I don't think is is the expected. See the updated image
> (http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex)
> that shows TextEdit correct handling of this.

What version of Emacs do you have?  The above renders correctly for
me, both in Emacs 24.5 and in the development version.  The word
בְּרֵאשִׁ֖ית is shown to the right of the footnote, and all the rest is
shown to the left of it.  Maybe you have an older Emacs which somehow
has a bug?

> Is there any change of having a way to set the unicode bidirectionally of a
> character within each separate mode? Could this be considered a feature?

I think it would be a misfeature, for the reasons explained above.
It's the same as using a private font to display some character in a
different shape -- you are the only one who will enjoy that shape.

However, nothing prevents a mode from using put-char-code-property in
some ingenious ways to do what you want.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22429; Package emacs. (Fri, 22 Jan 2016 14:04:02 GMT) Full text and rfc822 format available.

Message #23 received at 22429 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 22429 <at> debbugs.gnu.org
Subject: Re: bug#22429: Force character to be recognized as LTR inside RTL
 paragraph
Date: Fri, 22 Jan 2016 16:03:48 +0200
> From: Andy Moreton <andrewjmoreton <at> gmail.com>
> Date: Fri, 22 Jan 2016 09:31:39 +0000
> 
> While reading this message, I noticed odd behaviour of cursor motion
> with <right> and <left> (i.e. right-char and left-char). 
> 
> I would expect repeated <right> to move in logical order until the end
> of the buffer, but it gets stuck on the newline after "\pstart".
> Likewise repeated <left> from the end gets stuck at the newline before
> "\pend".
> 
> Saving this text in a file "foo.txt" showed the same behaviour (using the
> latest emacs-25 branch with "emacs -Q"). Is this expected ?

Yes, expected.  The paragraph direction changes when you enter a
paragraph that has a different base direction, and the arrow keys are
sensitive to the paragraph base direction.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#22429; Package emacs. (Fri, 22 Jan 2016 15:17:02 GMT) Full text and rfc822 format available.

Message #26 received at 22429 <at> debbugs.gnu.org (full text, mbox):

From: Filipe Moreira <famoreira <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 22429 <at> debbugs.gnu.org
Subject: Re: bug#22429: Force character to be recognized as LTR inside RTL
 paragraph
Date: Fri, 22 Jan 2016 15:15:51 +0000
[Message part 1 (text/plain, inline)]
On Fri, Jan 22, 2016 at 2:01 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: Filipe Moreira <famoreira <at> gmail.com>
> > Date: Fri, 22 Jan 2016 11:54:45 +0000
> > Cc: 22429 <at> debbugs.gnu.org
> >
> >     Emacs being Emacs, you can programmatically change the bidirectional
> >     class of every character, but that change has global effect: it will
> >     affect the directionality of that character everywhere in the Emacs
> >     session. So this is not recommended.
> >
> > Also this is not recommended, I would be willing to have the bidi class
> > property of some characters set to left-to-right, like the example of
> the slash
> > character.
>
> Can you tell why?  There are ways to produce the display you expect
> without changing the character properties; I described 3 such ways.
> If you change the properties, the text will only display correctly on
> your system, any other user who displays your text, either in Emacs or
> in other editor that supports bidirectional display, will see the text
> in the same jumbled order you wanted to avoid.  So I see very little
> sense in such changes.
>
> > Can you point somewhere regarding this? I saw the
> > get-char-code-property function but could not find anyway to
> > actually change the setting.
>
> You want put-char-code-property.  Again, I very much recommend not to
> do that.
>
> >     \begin{hebrew}
> >     \pstart
> >
> >     בְּרֵאשִׁ֖ית‪\footnoteA{This is a Hebrew related footnote}‬ בָּרָ֣א
> אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת
> >     הָאָֽרֶץ׃
> >
> >     \pend
> >     \end{hebrew}
> >
> >
> > In this example the direction of the surrounding Hebrew text has been
> changed.
> > The word בְּרֵאשִׁ֖ית should come before (i.e. on the right) of the word
> בָּרָ֣א. So
> > while the footnote command is correctly shown as LTR the Hebrew text has
> been
> > changed. I don't think is is the expected. See the updated image
> > (
> http://emacs.stackexchange.com/questions/19696/handling-left-to-right-inside-right-to-left-paragraphs-using-emacs-and-auctex
> )
> > that shows TextEdit correct handling of this.
>
> What version of Emacs do you have?  The above renders correctly for
> me, both in Emacs 24.5 and in the development version.  The word
> בְּרֵאשִׁ֖ית is shown to the right of the footnote, and all the rest is
> shown to the left of it.  Maybe you have an older Emacs which somehow
> has a bug?
>

I have just tested wrapping the footnote command within LTM (on both ends)
in a clean Emacs 24.5.1 (started with -Q) and it worked! This wasn't
working on my normal environment so I will need to investigate why that is.


>
> > Is there any change of having a way to set the unicode bidirectionally
> of a
> > character within each separate mode? Could this be considered a feature?
>
> I think it would be a misfeature, for the reasons explained above.
> It's the same as using a private font to display some character in a
> different shape -- you are the only one who will enjoy that shape.
>
> However, nothing prevents a mode from using put-char-code-property in
> some ingenious ways to do what you want.
>

I appreciate your help. This is all new to me and I've already learned a
lot from you and others regarding this. Thank you for making Emacs so
great.
[Message part 2 (text/html, inline)]

Added tag(s) notabug. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 24 Jan 2016 02:36:01 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 22429 <at> debbugs.gnu.org and "Filipe Moreira" <famoreira <at> gmail.com> Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 24 Jan 2016 02:36:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 21 Feb 2016 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 119 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.