GNU bug report logs -
#41005
problem with rendering Persian text in Emacs 27
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41005 in the body.
You can then email your comments to 41005 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 01 May 2020 18:34:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
hossein valizadeh <valizadeh.ho <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 01 May 2020 18:34:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I have been using GNU Emacs for a few years now, and I recently decided
to build Emacs 27.0.91 from the emacs-27 branch. I use Arch GNU/Linux,
along with the i3 window manager.
I have noticed a new problem with rendering Persian text in Emacs 27,
which to my knowledge was not present in earlier Emacs versions. If you
see the screenshots I have attached, some text seems to get garbled at
random. The same words sometimes appear correctly and sometimes do not.
I recall John Wiegley mentioning in his EmacsConf 2019 talk that Emacs
has recently started using the HarfBuzz text shaping engine. I wonder
if that's relevant and possibly the cause of this issue?
Thanks in advance for looking into this. I really hope the issue can be
resolved, as I use Emacs and Org extensively for writing my documents,
and this bug effectively renders Emacs unusable for me. I'll be happy
to try and provide any further information needed to debug this.
Regards,
Hossein Valizadeh
[Message part 2 (text/html, inline)]
[scr-1.png (image/png, attachment)]
[latest-screenshot.png (image/png, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 01 May 2020 18:53:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: hossein valizadeh <valizadeh.ho <at> gmail.com>
> Date: Fri, 1 May 2020 22:32:31 +0430
>
> I have noticed a new problem with rendering Persian text in Emacs 27,
> which to my knowledge was not present in earlier Emacs versions. If you
> see the screenshots I have attached, some text seems to get garbled at
> random. The same words sometimes appear correctly and sometimes do not.
> I recall John Wiegley mentioning in his EmacsConf 2019 talk that Emacs
> has recently started using the HarfBuzz text shaping engine. I wonder
> if that's relevant and possibly the cause of this issue?
I don't know. Was your Emacs built with HarfBuzz? You didn't attach
the data collected by "M-x report-emacs-bug", so I cannot know.
I suggest to start by attaching a sample file that exhibits the
problem, as text, not as an image, and then showing what it looked
like in previous versions of Emacs and what it looks like in Emacs
27.0.91 on your system. Also, please tell which font was in use when
the problematic display was produced. Then we can look into this
issue and see what could have caused it.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Wed, 03 Jun 2020 14:35:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Please keep the bug address on the CC list; use "Reply All".]
> From: hossein valizadeh <valizadeh.ho <at> gmail.com>
> Date: Wed, 3 Jun 2020 09:45:09 +0430
>
> The Emacs I'm using is compiled by HarfBuzz. I also attached a video file containing information about this
> problem for a better understanding.
>
> I checked my config file, line by line in different modes and in combination with different commands.
> I think it gets garbled in on of the following 2 situations:
>
> 1. the auto-fill-mode is on
> 2. column-number-mode is on (or while I'm using %c modifier in custom mode line)
>
> # These two seem to be related, As auto-fill-mode calculates the number of character in each line too.
> # I tried these in raw and un-configured Emacs; the results were the same and character got garbled.
> # The using font makes no different.(In raw Emacs I used "DejaVu Sans Mono" font )
> # Scale the buffer may help to show the word correctly but other words are still garbled.
> # It mostly happens when I'm trying to add something to existing line. (and sometimes it just normally
> happen)
> # I have deactivated these two options on my config file and Emacs worked properly since then.
>
> *** I noticed another bug :
>
> when I changed the input language to Persian(or any other language) by set-input-method command and
> scale the text with "C-x +", When I start typing, the first character in buffer ignores the input-method and an
> English character appears in buffer; But it won't happen for the next characters.
I asked you to please provide a test file with text that you think
looks incorrectly on display, and images that show how it looks in
Emacs 27 and in Emacs 26:
> I suggest to start by attaching a sample file that exhibits the
> problem, as text, not as an image, and then showing what it looked
> like in previous versions of Emacs and what it looks like in Emacs
> 27.0.91 on your system. Also, please tell which font was in use when
> the problematic display was produced. Then we can look into this
> issue and see what could have caused it.
Please provide that stuff, it is important to make progress with this
bug report. TIA.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Wed, 03 Jun 2020 17:25:02 GMT)
Full text and
rfc822 format available.
Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):
Not certain, but this sounds like it could be related to bug#37683
[1]. That report was closed because it does not seem to be an EWW
problem, but the issue still shows up for me.
To recap, the problem there is that Arabic script letters, which are
typically joined together in a cursive style, show up separated and
unconnected, if and only if text scaling is at zero. If the text is
made larger or smaller, the text becomes properly joined. This leads
me to suspect that it is not a Harfbuzz issue, because the text is
displayed correctly under some conditions but not under others.
[1] https://lists.gnu.org/archive/html/bug-gnu-emacs/2019-10/msg00921.html
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Wed, 03 Jun 2020 18:03:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Nicholas Drozd <nicholasdrozd <at> gmail.com>
> Date: Wed, 3 Jun 2020 12:24:30 -0500
>
> Not certain, but this sounds like it could be related to bug#37683
> [1]. That report was closed because it does not seem to be an EWW
> problem, but the issue still shows up for me.
>
> To recap, the problem there is that Arabic script letters, which are
> typically joined together in a cursive style, show up separated and
> unconnected, if and only if text scaling is at zero. If the text is
> made larger or smaller, the text becomes properly joined. This leads
> me to suspect that it is not a Harfbuzz issue, because the text is
> displayed correctly under some conditions but not under others.
It happens to you with any font that supports Arabic, or just with
some?
And if you change the scale, then change it back to zero, does the
problem happen again, or does it only happen when you look at some
text for the first time?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 02:38:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
1. This happen with every font that supports Arabic.
2. When scale back to zero the problem happen again.
On Wed, Jun 3, 2020 at 10:32 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Nicholas Drozd <nicholasdrozd <at> gmail.com>
> > Date: Wed, 3 Jun 2020 12:24:30 -0500
> >
> > Not certain, but this sounds like it could be related to bug#37683
> > [1]. That report was closed because it does not seem to be an EWW
> > problem, but the issue still shows up for me.
> >
> > To recap, the problem there is that Arabic script letters, which are
> > typically joined together in a cursive style, show up separated and
> > unconnected, if and only if text scaling is at zero. If the text is
> > made larger or smaller, the text becomes properly joined. This leads
> > me to suspect that it is not a Harfbuzz issue, because the text is
> > displayed correctly under some conditions but not under others.
>
> It happens to you with any font that supports Arabic, or just with
> some?
>
> And if you change the scale, then change it back to zero, does the
> problem happen again, or does it only happen when you look at some
> text for the first time?
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 03:00:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
1. This happen with every font that supports Arabic.
2. When scale back to zero the problem happen again.
I attached my emacs-bug-info.
On Thu, Jun 4, 2020 at 7:09 AM hossein valizadeh <valizadeh.ho <at> gmail.com>
wrote:
> Hello,
>
> 1. This happen with every font that supports Arabic.
> 2. When scale back to zero the problem happen again.
>
> On Wed, Jun 3, 2020 at 10:32 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
>
>> > From: Nicholas Drozd <nicholasdrozd <at> gmail.com>
>> > Date: Wed, 3 Jun 2020 12:24:30 -0500
>> >
>> > Not certain, but this sounds like it could be related to bug#37683
>> > [1]. That report was closed because it does not seem to be an EWW
>> > problem, but the issue still shows up for me.
>> >
>> > To recap, the problem there is that Arabic script letters, which are
>> > typically joined together in a cursive style, show up separated and
>> > unconnected, if and only if text scaling is at zero. If the text is
>> > made larger or smaller, the text becomes properly joined. This leads
>> > me to suspect that it is not a Harfbuzz issue, because the text is
>> > displayed correctly under some conditions but not under others.
>>
>> It happens to you with any font that supports Arabic, or just with
>> some?
>>
>> And if you change the scale, then change it back to zero, does the
>> problem happen again, or does it only happen when you look at some
>> text for the first time?
>>
>
[Message part 2 (text/html, inline)]
[Emacs-bug-info.txt (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 04:02:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 41005 <at> debbugs.gnu.org (full text, mbox):
On June 4, 2020 6:01:21 AM GMT+03:00, hossein valizadeh <valizadeh.ho <at> gmail.com> wrote:
> Hello,
>
> 1. This happen with every font that supports Arabic.
> 2. When scale back to zero the problem happen again.
>
> I attached my emacs-bug-info.
>
>
> On Thu, Jun 4, 2020 at 7:09 AM hossein valizadeh
> <valizadeh.ho <at> gmail.com>
> wrote:
>
> > Hello,
> >
> > 1. This happen with every font that supports Arabic.
> > 2. When scale back to zero the problem happen again.
> >
Thanks.
What is your version of HarfBuzz?
And what happens if you unset XMODIFIERS, i.e. disable the ibus input method framework?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 04:11:01 GMT)
Full text and
rfc822 format available.
Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):
On June 4, 2020 7:01:30 AM GMT+03:00, Eli Zaretskii <eliz <at> gnu.org> wrote:
> On June 4, 2020 6:01:21 AM GMT+03:00, hossein valizadeh
> <valizadeh.ho <at> gmail.com> wrote:
> > Hello,
> >
> > 1. This happen with every font that supports Arabic.
> > 2. When scale back to zero the problem happen again.
> >
> > I attached my emacs-bug-info.
> >
> >
> > On Thu, Jun 4, 2020 at 7:09 AM hossein valizadeh
> > <valizadeh.ho <at> gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > 1. This happen with every font that supports Arabic.
> > > 2. When scale back to zero the problem happen again.
> > >
>
>
> Thanks.
>
> What is your version of HarfBuzz?
> And what happens if you unset XMODIFIERS, i.e. disable the ibus input
> method framework?
Also, please go to the problematic place in the text and type "C-u C-x =", then post everything that Emacs shows in the *Help* buffer as result. Please do this both at text scale zero, when shaping is incorrect, and at non-zero scale, and post the contents of *Help* in both cases.
Screenshots of both displays as well as the text of the buffer used for these experiments will also help, as mentioned earlier
And finally, please try this with the version on the master branch, where a few fixes were installed lately.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 04:12:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 06:26:02 GMT)
Full text and
rfc822 format available.
Message #35 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
HarfBuzz version is 2.6.7
XMODIFIERS='' make no difference.
Text used in these screenshots are https://time.ir in EWW mode. (help
buffer content displayed in screenshots)
i will try on master branch and report the result.
On Thu, Jun 4, 2020 at 8:40 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> On June 4, 2020 7:01:30 AM GMT+03:00, Eli Zaretskii <eliz <at> gnu.org> wrote:
> > On June 4, 2020 6:01:21 AM GMT+03:00, hossein valizadeh
> > <valizadeh.ho <at> gmail.com> wrote:
> > > Hello,
> > >
> > > 1. This happen with every font that supports Arabic.
> > > 2. When scale back to zero the problem happen again.
> > >
> > > I attached my emacs-bug-info.
> > >
> > >
> > > On Thu, Jun 4, 2020 at 7:09 AM hossein valizadeh
> > > <valizadeh.ho <at> gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > 1. This happen with every font that supports Arabic.
> > > > 2. When scale back to zero the problem happen again.
> > > >
> >
> >
> > Thanks.
> >
> > What is your version of HarfBuzz?
> > And what happens if you unset XMODIFIERS, i.e. disable the ibus input
> > method framework?
>
> Also, please go to the problematic place in the text and type "C-u C-x =",
> then post everything that Emacs shows in the *Help* buffer as result.
> Please do this both at text scale zero, when shaping is incorrect, and at
> non-zero scale, and post the contents of *Help* in both cases.
>
> Screenshots of both displays as well as the text of the buffer used for
> these experiments will also help, as mentioned earlier
>
> And finally, please try this with the version on the master branch, where
> a few fixes were installed lately.
>
[Message part 2 (text/html, inline)]
[zero-scale.png (image/png, attachment)]
[none-zero-scale.png (image/png, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 06:26:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 08:29:01 GMT)
Full text and
rfc822 format available.
Message #41 received at 41005 <at> debbugs.gnu.org (full text, mbox):
hossein valizadeh <valizadeh.ho <at> gmail.com> writes:
> And finally, please try this with the version on the master branch, where a few fixes were
> installed lately.
I can reproduce the very similar issue described (Farsi Wikipedia entry
for Emacs), on current master. I believe I've figured it out, but I can
also debug further if required.
What happens is that font-shape-gstring is called with direction == L2R,
mis-shapes the text, then caches that version in the composition gstring
cache. The cache doesn't distinguish directions, and it's never cleared,
so this "infects" other buffers, but only if they're opened afterwards,
and only for the same words.
shr appears to force bidi-display-reordering off. Removing that let
binding from shr-insert-document avoids the problem.
We should consider adding direction to our gstrings, on master. While
we're there, let's also add script, language, and harfbuzz features to
the gstrings so we know we've captured the precise harfbuzz parameters?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 13:16:01 GMT)
Full text and
rfc822 format available.
Message #44 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Pip Cet <pipcet <at> gmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 41005 <at> debbugs.gnu.org,
> nicholasdrozd <at> gmail.com
> Date: Thu, 04 Jun 2020 08:28:24 +0000
>
> What happens is that font-shape-gstring is called with direction == L2R,
> mis-shapes the text, then caches that version in the composition gstring
> cache. The cache doesn't distinguish directions, and it's never cleared,
> so this "infects" other buffers, but only if they're opened afterwards,
> and only for the same words.
>
> shr appears to force bidi-display-reordering off. Removing that let
> binding from shr-insert-document avoids the problem.
Right, thanks. When I added that binding of bidi-display-reordering,
we didn't yet have HarfBuzz, and the other shapers' Arabic shaping
wasn't affected by the local text direction like HarfBuzz is.
> We should consider adding direction to our gstrings, on master. While
> we're there, let's also add script, language, and harfbuzz features to
> the gstrings so we know we've captured the precise harfbuzz parameters?
On emacs-27, I can fix this by a simpler band-aid below.
On master, I think we should indeed add direction to the cached
gstrings, as there could be other much more subtle situations where
looking at bidi-display-reordering is not enough, and we could then
still cache a composition with the wrong direction. Patches welcome.
As for script and language, for now adding them would just waste
memory, since we don't yet have any meaningful support for
buffer-local, let-alone paragraph-local, scripts and languages. When
we added HarfBuzz support, I considered adding some functionality to
determine language and/or script from the codepoints, but decided that
it made little sense, since HarfBuzz already does exactly that in
hb_buffer_guess_segment_properties. So I think we should add to the
FIXME in hbfont.c the fact that when we do support those internally in
Emacs, we should add these attributes to cached gstrings, but not yet
actually add them for now.
Here's the patch I propose for emacs-27:
diff --git a/src/hbfont.c b/src/hbfont.c
index 576c5fe..4b3f64e 100644
--- a/src/hbfont.c
+++ b/src/hbfont.c
@@ -26,6 +26,7 @@
#include "composite.h"
#include "font.h"
#include "dispextern.h"
+#include "buffer.h"
#ifdef HAVE_NTGUI
@@ -438,7 +439,11 @@ hbfont_shape (Lisp_Object lgstring, Lisp_Object direction)
/* If the caller didn't provide a meaningful DIRECTION, let HarfBuzz
guess it. */
- if (!NILP (direction))
+ if (!NILP (direction)
+ /* If they bind bidi-display-reordering to nil, the DIRECTION
+ they provide is meaningless, and we should let HarfBuzz guess
+ the real direction. */
+ && !NILP (BVAR (current_buffer, bidi_display_reordering)))
{
hb_direction_t dir = HB_DIRECTION_LTR;
if (EQ (direction, QL2R))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Thu, 04 Jun 2020 19:53:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 41005 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Pip Cet <pipcet <at> gmail.com>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>, 41005 <at> debbugs.gnu.org,
>> nicholasdrozd <at> gmail.com
>> Date: Thu, 04 Jun 2020 08:28:24 +0000
>>
>> What happens is that font-shape-gstring is called with direction == L2R,
>> mis-shapes the text, then caches that version in the composition gstring
>> cache. The cache doesn't distinguish directions, and it's never cleared,
>> so this "infects" other buffers, but only if they're opened afterwards,
>> and only for the same words.
>>
>> shr appears to force bidi-display-reordering off. Removing that let
>> binding from shr-insert-document avoids the problem.
>> We should consider adding direction to our gstrings, on master. While
>> we're there, let's also add script, language, and harfbuzz features to
>> the gstrings so we know we've captured the precise harfbuzz parameters?
>
> On emacs-27, I can fix this by a simpler band-aid below.
> On master, I think we should indeed add direction to the cached
> gstrings, as there could be other much more subtle situations where
> looking at bidi-display-reordering is not enough, and we could then
> still cache a composition with the wrong direction. Patches welcome.
Sure, such subtle situations exist.
> As for script and language, for now adding them would just waste
> memory, since we don't yet have any meaningful support for
> buffer-local, let-alone paragraph-local, scripts and languages. When
> we added HarfBuzz support, I considered adding some functionality to
> determine language and/or script from the codepoints, but decided that
> it made little sense, since HarfBuzz already does exactly that in
> hb_buffer_guess_segment_properties. So I think we should add to the
> FIXME in hbfont.c the fact that when we do support those internally in
> Emacs, we should add these attributes to cached gstrings, but not yet
> actually add them for now.
We're going to have to change the lgstring structure, though, right?
Could we maybe get away with just doing so once? My suggestion would be
to add a single field to gstrings which would currently be L2R or R2L
but might become an alist or something when we get around to adding
those features?
> Here's the patch I propose for emacs-27:
Let's try that.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 04:45:01 GMT)
Full text and
rfc822 format available.
Message #50 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I tried the patch.
The eww problem is solved, but the problem still there, when I enable
auto-fill-mode or column-number-mode.
Please look at this video file for a better understanding:
http://s13.picofile.com/d/8399189550/7eeb413f-0df7-4da6-9db1-1632c9fc749f/out.mkv
https://filebin.net/mzmjm74lp7wsxr8e
https://gofile.io/d/H8xk26
For example, if you type in the following sentence:
این نام است که میماند
Then go back a few characters in the same line and type words randomly. You
will see that the letters in some words are displayed separately. I type a
few words at random, after the word این and before the word نام :
این فراموشی را به همه اینکه فرمت مراتب افتتاح گرامی گرایش سراسیمه نام است
که میماند
This line should look like this:
http://s12.picofile.com/file/8399190550/correct.png
But if one of the auto-fill-mode or column-number-mode is enabled. it will
be displayed this way:
http://s13.picofile.com/file/8399190584/malformed.png
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 06:22:01 GMT)
Full text and
rfc822 format available.
Message #53 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: hossein valizadeh <valizadeh.ho <at> gmail.com>
> Date: Fri, 5 Jun 2020 09:16:53 +0430
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 41005 <at> debbugs.gnu.org,
> Nicholas Drozd <nicholasdrozd <at> gmail.com>
>
> I tried the patch.
Please tell which patch was that, and in what Emacs version you tried
it. (Please understand that you are generally talking to people some
of whom don't read Arabic or Persian, so the more details you supply
the less misunderstanding and confusion will follow, and the faster
this problem will be solved.)
> The eww problem is solved, but the problem still there, when I enable auto-fill-mode or
> column-number-mode.
>
> Please look at this video file for a better understanding:
> http://s13.picofile.com/d/8399189550/7eeb413f-0df7-4da6-9db1-1632c9fc749f/out.mkv
> https://filebin.net/mzmjm74lp7wsxr8e
> https://gofile.io/d/H8xk26
I cannot play this on my system, I see a bunch of ads (or what looks
like ads), and the name of a .mkv file.
> For example, if you type in the following sentence:
> این نام است که میماند
>
> Then go back a few characters in the same line and type words randomly. You will see that the letters in
> some words are displayed separately. I type a few words at random, after the word این and before the word
> نام :
>
> این فراموشی را به همه اینکه فرمت مراتب افتتاح گرامی گرایش سراسیمه نام است که میماند
>
> This line should look like this:
>
> http://s12.picofile.com/file/8399190550/correct.png
>
> But if one of the auto-fill-mode or column-number-mode is enabled. it will be displayed this way:
>
> http://s13.picofile.com/file/8399190584/malformed.png
Please show a full recipe for reproducing this, starting from
"emacs -Q", and describing every step to reproduce the result. Please
tell the codepoint of each character you type to reproduce the problem
and each Emacs command. I'd also greatly appreciate if you
specifically point out at which parts of the display to look at what
differences to pay attention to. This is needed to fully understand
the problem and analyze its root cause(s), given that not all of us
can read the Arabic script.
The master branch has recently got a few improvements in this area, so
please use that version of the code if you can. And in any case,
please always state in what version you see which problem.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 06:40:01 GMT)
Full text and
rfc822 format available.
Message #56 received at 41005 <at> debbugs.gnu.org (full text, mbox):
hossein valizadeh <valizadeh.ho <at> gmail.com> writes:
> I tried the patch.
> The eww problem is solved, but the problem still there, when I enable auto-fill-mode or
> column-number-mode.
I only see it with column-number-mode so far (but, again, I can
reproduce it and debug further).
It's this code in indent.c
/* Check composition sequence. */
if (cmp_it.id >= 0
|| (scan == cmp_it.stop_pos
&& composition_reseat_it (&cmp_it, scan, scan_byte, end,
w, NEUTRAL_DIR, NULL, Qnil)))
which appears to think the sixth argument to composition_reseat_it is a
direction (it probably should be). It's actually a bidi level, and
passing NEUTRAL_DIR, which is 0, results in L2R layout being used by
hbfont_shape, as in the eww bug.
Will write a patch soon if no one beats me to it.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 08:03:02 GMT)
Full text and
rfc822 format available.
Message #59 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Pip Cet <pipcet <at> gmail.com>
> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
> Date: Thu, 04 Jun 2020 19:52:42 +0000
>
> > As for script and language, for now adding them would just waste
> > memory, since we don't yet have any meaningful support for
> > buffer-local, let-alone paragraph-local, scripts and languages. When
> > we added HarfBuzz support, I considered adding some functionality to
> > determine language and/or script from the codepoints, but decided that
> > it made little sense, since HarfBuzz already does exactly that in
> > hb_buffer_guess_segment_properties. So I think we should add to the
> > FIXME in hbfont.c the fact that when we do support those internally in
> > Emacs, we should add these attributes to cached gstrings, but not yet
> > actually add them for now.
>
> We're going to have to change the lgstring structure, though, right?
I think so. We should probably add one more element to the vector in
LGSTRING_HEADER, because the header is the hash key of the composition
cache.
> Could we maybe get away with just doing so once? My suggestion would be
> to add a single field to gstrings which would currently be L2R or R2L
> but might become an alist or something when we get around to adding
> those features?
We could add an element that would currently be a symbol or an
integer, but could later become a vector of several elements. Is that
what you had in mind? (I think we should prefer vectors to lists in
this case, because consing them is slightly faster, and the number of
elements is known in advance and fixed.)
> > Here's the patch I propose for emacs-27:
>
> Let's try that.
Pushed to the emacs-27 branch.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 08:42:02 GMT)
Full text and
rfc822 format available.
Message #62 received at 41005 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Pip Cet <pipcet <at> gmail.com>
>> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
>> Date: Thu, 04 Jun 2020 19:52:42 +0000
>>
>> > As for script and language, for now adding them would just waste
>> > memory, since we don't yet have any meaningful support for
>> > buffer-local, let-alone paragraph-local, scripts and languages. When
>> > we added HarfBuzz support, I considered adding some functionality to
>> > determine language and/or script from the codepoints, but decided that
>> > it made little sense, since HarfBuzz already does exactly that in
>> > hb_buffer_guess_segment_properties. So I think we should add to the
>> > FIXME in hbfont.c the fact that when we do support those internally in
>> > Emacs, we should add these attributes to cached gstrings, but not yet
>> > actually add them for now.
>>
>> We're going to have to change the lgstring structure, though, right?
>
> I think so. We should probably add one more element to the vector in
> LGSTRING_HEADER, because the header is the hash key of the composition
> cache.
Do we have to maintain compatibility? If so, I suggest we change
[FONT-OBJECT CHAR ...]
to
[FONT-OBJECT [CHAR ...] DIRECTION], and use ARRAYP (AREF (..., 2)) to
decide whether the new format is in effect. I actually thought about
suggesting the format be [FONT-OBJECT STRING DIRECTION], but that would
make debugging harder when pretty-printing the string in a failed
composition re-attempts that composition.
But of course it would be easier not to maintain compatibility, and then
we could order the elements any way we choose.
>> Could we maybe get away with just doing so once? My suggestion would be
>> to add a single field to gstrings which would currently be L2R or R2L
>> but might become an alist or something when we get around to adding
>> those features?
>
> We could add an element that would currently be a symbol or an
> integer, but could later become a vector of several elements. Is that
> what you had in mind?
Yes, a vector was what I meant by "or something".
> (I think we should prefer vectors to lists in
> this case, because consing them is slightly faster, and the number of
> elements is known in advance and fixed.)
No argument there, though harfbuzz features, if we ever add them,
probably should be added as a list inside the vector. I'm talking about
things like "kern=0" passed to hb_feature_from_string, then to hb_shape,
to disable kerning.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 11:09:01 GMT)
Full text and
rfc822 format available.
Message #65 received at 41005 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: hossein valizadeh <valizadeh.ho <at> gmail.com>
>> Date: Fri, 5 Jun 2020 09:16:53 +0430
>> Cc: Eli Zaretskii <eliz <at> gnu.org>, 41005 <at> debbugs.gnu.org,
>> Nicholas Drozd <nicholasdrozd <at> gmail.com>
>>
>> The eww problem is solved, but the problem still there, when I enable auto-fill-mode or
>> column-number-mode.
>>
>> Please look at this video file for a better understanding:
>> http://s13.picofile.com/d/8399189550/7eeb413f-0df7-4da6-9db1-1632c9fc749f/out.mkv
>> https://filebin.net/mzmjm74lp7wsxr8e
>> https://gofile.io/d/H8xk26
>
> I cannot play this on my system, I see a bunch of ads (or what looks
> like ads), and the name of a .mkv file.
On the first page, you need to click twice the red button with the link
icon that according to my beginner-level reading of Perso-Arabic script
says "[something] link download".
That said, the easiest way for me to download the file was from the
third gofile link. (All three links point to the same file AFAICT.)
If you're not able to play .mkv on Windows I think VLC supports it OOTB
(but I haven't used Windows in almost a decade so YMMV).
The large screen resolution of the screencast makes it a bit hard for me
to tell what's going on on my 14" laptop, though.
HTH,
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 11:43:01 GMT)
Full text and
rfc822 format available.
Message #68 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Pip Cet <pipcet <at> gmail.com>
> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
> Date: Fri, 05 Jun 2020 08:41:19 +0000
>
> >> We're going to have to change the lgstring structure, though, right?
> >
> > I think so. We should probably add one more element to the vector in
> > LGSTRING_HEADER, because the header is the hash key of the composition
> > cache.
>
> Do we have to maintain compatibility? If so, I suggest we change
>
> [FONT-OBJECT CHAR ...]
>
> to
>
> [FONT-OBJECT [CHAR ...] DIRECTION], and use ARRAYP (AREF (..., 2)) to
> decide whether the new format is in effect. I actually thought about
> suggesting the format be [FONT-OBJECT STRING DIRECTION], but that would
> make debugging harder when pretty-printing the string in a failed
> composition re-attempts that composition.
>
> But of course it would be easier not to maintain compatibility, and then
> we could order the elements any way we choose.
We don't have to be backward-compatible here, I think, as the
structure of the header is an internal implementation detail. So
something like [FONT-OBJECT DIRECTION CHAR ...] is also a possibility.
We could also put DIRECTION elsewhere and just modify the code that
passes the hash key to the hash function.
> > (I think we should prefer vectors to lists in
> > this case, because consing them is slightly faster, and the number of
> > elements is known in advance and fixed.)
>
> No argument there, though harfbuzz features, if we ever add them,
> probably should be added as a list inside the vector. I'm talking about
> things like "kern=0" passed to hb_feature_from_string, then to hb_shape,
> to disable kerning.
Maybe. Something to consider when we actually support those features.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 12:31:02 GMT)
Full text and
rfc822 format available.
Message #71 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
> Please tell which patch was that, and in what Emacs version you tried
> it. (Please understand that you are generally talking to people some
> of whom don't read Arabic or Persian, so the more details you supply
> the less misunderstanding and confusion will follow, and the faster
> this problem will be solved.)
I'm sorry, my English is so bad, at least in writing. That's why it's
hard for me to give a full explanation.
My Emacs version: 27.0.91
I applied this patch:
diff --git a/src/hbfont.c b/src/hbfont.c
index 576c5fe..4b3f64e 100644
--- a/src/hbfont.c
+++ b/src/hbfont.c
@@ -26,6 +26,7 @@
#include "composite.h"
#include "font.h"
#include "dispextern.h"
+#include "buffer.h"
#ifdef HAVE_NTGUI
@@ -438,7 +439,11 @@ hbfont_shape (Lisp_Object lgstring, Lisp_Object
direction)
/* If the caller didn't provide a meaningful DIRECTION, let HarfBuzz
guess it. */
- if (!NILP (direction))
+ if (!NILP (direction)
+ /* If they bind bidi-display-reordering to nil, the DIRECTION
+ they provide is meaningless, and we should let HarfBuzz guess
+ the real direction. */
+ && !NILP (BVAR (current_buffer, bidi_display_reordering)))
{
hb_direction_t dir = HB_DIRECTION_LTR;
if (EQ (direction, QL2R))
--------------------------------------------------------------------------------
> I cannot play this on my system, I see a bunch of ads (or what looks
> like ads), and the name of a .mkv file.
video file reuploaded (.mp4 and .ogg):
https://srv-file9.gofile.io/download/cz7P41/out.mp4
https://srv-file4.gofile.io/download/Mwv8k4/out.ogg
--------------------------------------------------------------------------------
This patch solved the problems of eww, newsticker, (and possibly some
other major modes) in displaying Persian/Arabic words. However, if one
of the auto-fill-mode or column-number-mode is enabled, there is still
the same problem in files that use Persian or Arabic characters.
Especially when you want to go back a few characters in a line and add
something to that line.
--------------------------------------------------------------------------------
> Please tell the codepoint of each character you type to reproduce
> the problem and each Emacs command.
For example, if you type in the following sentence:
این نام است که میماند
codepoint:
\u0627\u06cc\u0646 \u0646\u0627\u0645 \u0627\u0633\u062a \u06a9\u0647
\u0645\u06cc\u200c\u0645\u0627\u0646\u062f
Then go back a few characters in the same line and type words randomly. You
will see that the letters in some words are displayed separately. I type a
few words at random, after the word این and before the word نام :
این فراموشی را به همه اینکه فرمت مراتب افتتاح گرامی گرایش سراسیمه نام است
که میماند
codepoint:
\u0627\u06cc\u0646 \u0641\u0631\u0627\u0645\u0648\u0634\u06cc \u0631\u0627
\u0628\u0647 \u0647\u0645\u0647 \u0627\u06cc\u0646\u06a9\u0647
\u0641\u0631\u0645\u062a \u0645\u0631\u0627\u062a\u0628
\u0627\u0641\u062a\u062a\u0627\u062d \u06af\u0631\u0627\u0645\u06cc
\u06af\u0631\u0627\u06cc\u0634 \u0633\u0631\u0627\u0633\u06cc\u0645\u0647
\u0646\u0627\u0645 \u0627\u0633\u062a \u06a9\u0647
\u0645\u06cc\u200c\u0645\u0627\u0646\u062f
This line should look like this:
http://s12.picofile.com/file/8399190550/correct.png
But if one of the auto-fill-mode or column-number-mode is enabled. it will
be displayed this way:
http://s13.picofile.com/file/8399190584/malformed.png
--------------------------------------------------------------------------------
> Please show a full recipe for reproducing this, starting from
> "emacs -Q", and describing every step to reproduce the result.
All steps are displayed in the video file.
> I'd also greatly appreciate if you specifically point out at which
> parts of the display to look at what differences to pay attention
> to. This is needed to fully understand the problem and analyze its
> root cause(s), given that not all of us can read the Arabic script.
The letters that should normally be connected will be displayed separately:
http://s12.picofile.com/file/8399222618/latest_screenshot.png
--------------------------------------------------------------------------------
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 12:54:01 GMT)
Full text and
rfc822 format available.
Message #74 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: hossein valizadeh <valizadeh.ho <at> gmail.com>
> Date: Fri, 5 Jun 2020 17:02:55 +0430
> Cc: pipcet <at> gmail.com, 41005 <at> debbugs.gnu.org,
> Nicholas Drozd <nicholasdrozd <at> gmail.com>
>
> I'm sorry, my English is so bad, at least in writing. That's why it's
> hard for me to give a full explanation.
Your English is entirely adequate, there's nothing for you to be
ashamed of.
Thanks for the details, I think they point to the code identified by
Pip Cet, which is called from current-column and similar APIs. A fix
will probably be available soon.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 13:06:01 GMT)
Full text and
rfc822 format available.
Message #77 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: hossein valizadeh <valizadeh.ho <at> gmail.com>
>> Date: Fri, 5 Jun 2020 17:02:55 +0430
>> Cc: pipcet <at> gmail.com, 41005 <at> debbugs.gnu.org,
>> Nicholas Drozd <nicholasdrozd <at> gmail.com>
>>
>> I'm sorry, my English is so bad, at least in writing. That's why it's
>> hard for me to give a full explanation.
>
> Your English is entirely adequate, there's nothing for you to be
> ashamed of.
>
> Thanks for the details, I think they point to the code identified by
> Pip Cet, which is called from current-column and similar APIs. A fix
> will probably be available soon.
I think the attached patch is a fairly minimal fix; it's against master,
applies to emacs-27 but I haven't tested it there.
Given these two bugs, I wonder whether it wouldn't be more reasonable
always to let HarfBuzz guess the direction, at least for Emacs-27:
scripts which change direction, if they are supported by HarfBuzz, won't
work anyway.
Or am I missing something?
[0001-Test-patch-for-bug-41005.patch (text/x-diff, inline)]
From 18d0e15ac298f40951ddeeec56e9d87c01f51798 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> gmail.com>
Date: Fri, 5 Jun 2020 12:54:01 +0000
Subject: [PATCH] Test patch for bug#41005
---
src/composite.c | 4 +++-
src/indent.c | 4 ++--
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/composite.c b/src/composite.c
index 2c589e4f3a..13421a80da 100644
--- a/src/composite.c
+++ b/src/composite.c
@@ -1213,7 +1213,9 @@ composition_reseat_it (struct composition_it *cmp_it, ptrdiff_t charpos,
continue;
if (charpos < endpos)
{
- if ((bidi_level & 1) == 0)
+ if (bidi_level < 0)
+ direction = Qnil;
+ else if ((bidi_level & 1) == 0)
direction = QL2R;
else
direction = QR2L;
diff --git a/src/indent.c b/src/indent.c
index c0b4c13b2c..581323b91e 100644
--- a/src/indent.c
+++ b/src/indent.c
@@ -596,7 +596,7 @@ scan_for_column (ptrdiff_t *endpos, EMACS_INT *goalcol, ptrdiff_t *prevcol)
if (cmp_it.id >= 0
|| (scan == cmp_it.stop_pos
&& composition_reseat_it (&cmp_it, scan, scan_byte, end,
- w, NEUTRAL_DIR, NULL, Qnil)))
+ w, -1, NULL, Qnil)))
composition_update_it (&cmp_it, scan, scan_byte, Qnil);
if (cmp_it.id >= 0)
{
@@ -1504,7 +1504,7 @@ compute_motion (ptrdiff_t from, ptrdiff_t frombyte, EMACS_INT fromvpos,
if (cmp_it.id >= 0
|| (pos == cmp_it.stop_pos
&& composition_reseat_it (&cmp_it, pos, pos_byte, to, win,
- NEUTRAL_DIR, NULL, Qnil)))
+ -1, NULL, Qnil)))
composition_update_it (&cmp_it, pos, pos_byte, Qnil);
if (cmp_it.id >= 0)
{
--
2.27.0.rc0
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 14:14:02 GMT)
Full text and
rfc822 format available.
Message #80 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Pip Cet <pipcet <at> gmail.com>
> Cc: hossein valizadeh <valizadeh.ho <at> gmail.com>, 41005 <at> debbugs.gnu.org,
> nicholasdrozd <at> gmail.com
> Date: Fri, 05 Jun 2020 13:05:43 +0000
>
> I think the attached patch is a fairly minimal fix; it's against master,
> applies to emacs-27 but I haven't tested it there.
Thanks, it LGTM.
I think we should put this on emacs-27, because this is a regression
caused by Emacs 27's support for HarfBuzz as the default shaping
engine. The other shapers didn't want us to provide the direction,
they determined it internally. We added the DIRECTION argument as
part of integrating HarfBuzz.
We could do better than your patch by actually computing the resolved
bidi level there, which would require start_display followed by
move_it_to, in which case we probably won't need to call
composition_reseat_it by hand at all, and could just pick up the
result produced by move_it_to. Or maybe we should just use
Fvertical_motion instead (which does all that internally). But these
ideas are for the master branch, not for emacs-27.
> Given these two bugs, I wonder whether it wouldn't be more reasonable
> always to let HarfBuzz guess the direction, at least for Emacs-27:
> scripts which change direction, if they are supported by HarfBuzz, won't
> work anyway.
Please explain "scripts that change direction" and "won't work
anyway", I don't think I understand that part.
The reason we don't let HarfBuzz guess in all cases is because the
resolved bidi level, when we have it, is a more accurate indication of
the required direction. For example, if you have RTL characters
inside the LRO..PDF embedding, it would be wrong to let the shaper
guess, because it could (and usually will) guess wrongly that the
direction is R2L. It is true that these are rare and unusual use
cases, but they do exist, and Emacs does want to support them,
including with scripts that must use the shaping engine.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 14:22:01 GMT)
Full text and
rfc822 format available.
Message #83 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I tested Pip Cet's patch on Emacs 27.0.91 and everything was fine.
In both cases (auto-fill-mode & column-number-mode) There was no
confusion in the letters; and all the words were displayed correctly.
Thanks.
[Message part 2 (text/html, inline)]
[0001-Test-patch-for-bug-41005.patch (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 05 Jun 2020 14:26:02 GMT)
Full text and
rfc822 format available.
Message #86 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: hossein valizadeh <valizadeh.ho <at> gmail.com>
> Date: Fri, 5 Jun 2020 18:53:40 +0430
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 41005 <at> debbugs.gnu.org,
> Nicholas Drozd <nicholasdrozd <at> gmail.com>
>
> I tested Pip Cet's patch on Emacs 27.0.91 and everything was fine.
> In both cases (auto-fill-mode & column-number-mode) There was no
> confusion in the letters; and all the words were displayed correctly.
Great, thanks for testing it.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 06 Jun 2020 08:39:01 GMT)
Full text and
rfc822 format available.
Message #89 received at 41005 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Pip Cet <pipcet <at> gmail.com>
>> Cc: hossein valizadeh <valizadeh.ho <at> gmail.com>, 41005 <at> debbugs.gnu.org,
>> nicholasdrozd <at> gmail.com
>> Date: Fri, 05 Jun 2020 13:05:43 +0000
>>
>> I think the attached patch is a fairly minimal fix; it's against master,
>> applies to emacs-27 but I haven't tested it there.
>
> Thanks, it LGTM.
>
> I think we should put this on emacs-27, because this is a regression
> caused by Emacs 27's support for HarfBuzz as the default shaping
> engine. The other shapers didn't want us to provide the direction,
> they determined it internally. We added the DIRECTION argument as
> part of integrating HarfBuzz.
Okay, will do that unless someone objects.
> We could do better than your patch by actually computing the resolved
> bidi level there, which would require start_display followed by
> move_it_to, in which case we probably won't need to call
> composition_reseat_it by hand at all, and could just pick up the
> result produced by move_it_to. Or maybe we should just use
> Fvertical_motion instead (which does all that internally). But these
> ideas are for the master branch, not for emacs-27.
That sounds good.
>> Given these two bugs, I wonder whether it wouldn't be more reasonable
>> always to let HarfBuzz guess the direction, at least for Emacs-27:
>> scripts which change direction, if they are supported by HarfBuzz, won't
>> work anyway.
>
> Please explain "scripts that change direction" and "won't work
> anyway", I don't think I understand that part.
I think your example (RLO..PDF in RTL text) is better: that won't work
anyway, right now, because if, for example, you type
<HEBREW LETTER SHIN> <RIGHT-TO-LEFT OVERRIDE> f i
and have set the char table to treat "fi" as a ligature, the result will
(at least sometimes) be an "fi" ligature, but it should look like the
word "if".
> The reason we don't let HarfBuzz guess in all cases is because the
> resolved bidi level, when we have it, is a more accurate indication of
> the required direction.
Yes, but we'll still cache the wrong direction. If we let HarfBuzz guess
in all cases, output will be consistent and usually correct
> For example, if you have RTL characters
> inside the LRO..PDF embedding, it would be wrong to let the shaper
> guess, because it could (and usually will) guess wrongly that the
> direction is R2L. It is true that these are rare and unusual use
> cases, but they do exist, and Emacs does want to support them,
> including with scripts that must use the shaping engine.
As I described, I don't think RLO..PDF works with shaping right now,
because other code might have already cached the non-overridden glyph
string.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 06 Jun 2020 09:05:01 GMT)
Full text and
rfc822 format available.
Message #92 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Pip Cet <pipcet <at> gmail.com>
> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
> Date: Sat, 06 Jun 2020 08:38:39 +0000
>
> >> Given these two bugs, I wonder whether it wouldn't be more reasonable
> >> always to let HarfBuzz guess the direction, at least for Emacs-27:
> >> scripts which change direction, if they are supported by HarfBuzz, won't
> >> work anyway.
> >
> > Please explain "scripts that change direction" and "won't work
> > anyway", I don't think I understand that part.
>
> I think your example (RLO..PDF in RTL text) is better: that won't work
> anyway, right now, because if, for example, you type
>
> <HEBREW LETTER SHIN> <RIGHT-TO-LEFT OVERRIDE> f i
>
> and have set the char table to treat "fi" as a ligature, the result will
> (at least sometimes) be an "fi" ligature, but it should look like the
> word "if".
That's not how shaping engines work, at least not how HarfBuzz does
AFAIU. It gets the characters in the logical order, so it always
wants to see "fi", even if the directionality of the characters was
overridden, and it also wants to know the local text directionality.
What is produced from that depends on the font: if it has different
ligatures for "fi" in different directions, then HarfBuzz should give
us back the ligature appropriate for the direction it was passed.
(Personally, I think that when some text uses a directional override,
they don't intend to see ligatures, because the override is mostly for
treating characters as independent of the surrounding context. But
this is eventually up to the font to specify. AFAIU, Arabic shaping
works differently in different directional contexts, for example.)
> > The reason we don't let HarfBuzz guess in all cases is because the
> > resolved bidi level, when we have it, is a more accurate indication of
> > the required direction.
>
> Yes, but we'll still cache the wrong direction.
Why "wrong"? We will cache the same direction as we passed to
HarfBuzz, and thus the produced glyphs will be consistent with the
cached direction. And if we ever need to display the same sequence of
characters with a different direction, the cached sequence will fail
to match, and we will call HarfBuzz again to produce glyphs for this
other direction. That sounds TRT to me.
> If we let HarfBuzz guess in all cases, output will be consistent and
> usually correct
We want the direction to be _always_ correct, not just "usually". The
shapers we used before HarfBuzz didn't allow to pass the direction,
they always guessed it. HarfBuzz lets us specify the direction, which
is progress, since Emacs now has better control on the glyphs that are
produced, and HarfBuzz developers tell us the difference sometimes
matters.
> > For example, if you have RTL characters
> > inside the LRO..PDF embedding, it would be wrong to let the shaper
> > guess, because it could (and usually will) guess wrongly that the
> > direction is R2L. It is true that these are rare and unusual use
> > cases, but they do exist, and Emacs does want to support them,
> > including with scripts that must use the shaping engine.
>
> As I described, I don't think RLO..PDF works with shaping right now,
> because other code might have already cached the non-overridden glyph
> string.
I was saying that under the assumption that the direction will be
cached. You are right that currently this doesn't work correctly, but
that's exactly why we agreed to cache the direction with the other
composition information. Once the caching of direction is
implemented, my point is that passing the direction to HarfBuzz and
caching it will produce better results for text in a directional
override than if we let HarfBuzz guess the direction.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 06 Jun 2020 09:12:01 GMT)
Full text and
rfc822 format available.
Message #95 received at 41005 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Pip Cet <pipcet <at> gmail.com>
>> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
>> Date: Sat, 06 Jun 2020 08:38:39 +0000
>>
>> >> Given these two bugs, I wonder whether it wouldn't be more reasonable
>> >> always to let HarfBuzz guess the direction, at least for Emacs-27:
~~~~~~~~~~~~~~~~~~~~~
I should have been clearer here and said I was only concerned with Emacs 27.
>> >> scripts which change direction, if they are supported by HarfBuzz, won't
>> >> work anyway.
>> >
>> > Please explain "scripts that change direction" and "won't work
>> > anyway", I don't think I understand that part.
>>
>> I think your example (RLO..PDF in RTL text) is better: that won't work
>> anyway, right now, because if, for example, you type
>>
>> <HEBREW LETTER SHIN> <RIGHT-TO-LEFT OVERRIDE> f i
>>
>> and have set the char table to treat "fi" as a ligature, the result will
>> (at least sometimes) be an "fi" ligature, but it should look like the
>> word "if".
>
> That's not how shaping engines work, at least not how HarfBuzz does
> AFAIU. It gets the characters in the logical order, so it always
> wants to see "fi", even if the directionality of the characters was
> overridden, and it also wants to know the local text directionality.
> What is produced from that depends on the font: if it has different
> ligatures for "fi" in different directions, then HarfBuzz should give
> us back the ligature appropriate for the direction it was passed.
>
> (Personally, I think that when some text uses a directional override,
> they don't intend to see ligatures, because the override is mostly for
> treating characters as independent of the surrounding context. But
> this is eventually up to the font to specify. AFAIU, Arabic shaping
> works differently in different directional contexts, for example.)
>
>> > The reason we don't let HarfBuzz guess in all cases is because the
>> > resolved bidi level, when we have it, is a more accurate indication of
>> > the required direction.
>>
>> Yes, but we'll still cache the wrong direction.
>
> Why "wrong"? We will cache the same direction as we passed to
> HarfBuzz, and thus the produced glyphs will be consistent with the
> cached direction. And if we ever need to display the same sequence of
> characters with a different direction, the cached sequence will fail
> to match, and we will call HarfBuzz again to produce glyphs for this
> other direction. That sounds TRT to me.
You're absolutely correct, sorry for wasting so much of your time with
this: caching directions is the right thing, I was just concerned about
what to do in Emacs 27 where AIUI we don't want to cache directions...
> Once the caching of direction is
> implemented, my point is that passing the direction to HarfBuzz and
> caching it will produce better results for text in a directional
> override than if we let HarfBuzz guess the direction.
Again, I agree. Sorry for the misunderstanding.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 06 Jun 2020 09:26:02 GMT)
Full text and
rfc822 format available.
Message #98 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Pip Cet <pipcet <at> gmail.com>
> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
> Date: Sat, 06 Jun 2020 09:11:36 +0000
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> >> From: Pip Cet <pipcet <at> gmail.com>
> >> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
> >> Date: Sat, 06 Jun 2020 08:38:39 +0000
> >>
> >> >> Given these two bugs, I wonder whether it wouldn't be more reasonable
> >> >> always to let HarfBuzz guess the direction, at least for Emacs-27:
> ~~~~~~~~~~~~~~~~~~~~~
>
> I should have been clearer here and said I was only concerned with Emacs 27.
Ah, okay. That settles some of the misunderstanding.
But even in emacs-27, I think passing the right direction where we
know it is better. For example, the use case with directional
override is only problematic in emacs-27 if the following conditions
are both true:
. the same sequence of characters is used elsewhere, but without the
override
. the font glyphs produced by the shaper are different for different
directions
The first is somewhat likely to happen, but the second happens only
for some specific scripts, such as Arabic (again, if we consider the
scope of what is supported well by Emacs 27, which, for example,
excludes ligatures).
And the use case with directional overrides is itself very rare. The
more frequent use cases which hit on this deficiency in emacs-27 are
those which you just fixed on emacs-27, and the fix is indeed to let
HarfBuzz guess.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 06 Jun 2020 13:11:01 GMT)
Full text and
rfc822 format available.
Message #101 received at 41005 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Pip Cet <pipcet <at> gmail.com>
>> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
>> Date: Sat, 06 Jun 2020 09:11:36 +0000
>>
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>
>> >> From: Pip Cet <pipcet <at> gmail.com>
>> >> Cc: valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org,
>> >> nicholasdrozd <at> gmail.com
>> >> Date: Sat, 06 Jun 2020 08:38:39 +0000
>> >>
>> >> >> Given these two bugs, I wonder whether it wouldn't be more reasonable
>> >> >> always to let HarfBuzz guess the direction, at least for Emacs-27:
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> I should have been clearer here and said I was only concerned with Emacs 27.
>
> Ah, okay. That settles some of the misunderstanding.
>
> But even in emacs-27, I think passing the right direction where we
> know it is better. For example, the use case with directional
> override is only problematic in emacs-27 if the following conditions
> are both true:
>
> . the same sequence of characters is used elsewhere, but without the
> override
> . the font glyphs produced by the shaper are different for different
> directions
Thanks, you've convinced me. Let's do it like that.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Tue, 21 Jul 2020 12:42:01 GMT)
Full text and
rfc822 format available.
Message #104 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Eli, Pip,
Thank you for working on this and fixing it on emacs-27 per the reports.
I have not been able to keep a close eye on GNU lists and trackers for a
while now, but I was wondering if there's been any more progress/updates
on this issue on master? I sometimes read/write in Persian, and this
bug is quite an annoyance. :-) Changing the text scale (C-x C-=) and/or
filling the paragraph (M-q) seems to sometimes help, but not always; and
the issue remains.
Thanks again for your work on this.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Tue, 21 Jul 2020 13:35:02 GMT)
Full text and
rfc822 format available.
Message #107 received at 41005 <at> debbugs.gnu.org (full text, mbox):
>>>>> On Tue, 21 Jul 2020 08:40:56 -0400, Amin Bandali <bandali <at> gnu.org> said:
Amin> Hi Eli, Pip,
Amin> Thank you for working on this and fixing it on emacs-27 per the reports.
Amin> I have not been able to keep a close eye on GNU lists and trackers for a
Amin> while now, but I was wondering if there's been any more progress/updates
Amin> on this issue on master? I sometimes read/write in Persian, and this
Amin> bug is quite an annoyance. :-) Changing the text scale (C-x C-=) and/or
Amin> filling the paragraph (M-q) seems to sometimes help, but not always; and
Amin> the issue remains.
emacs-27 gets merged to master on a regular basis, and the relevant
commit is present in both:
$ git branch --contains 30a7ee505a
* emacs-27
+ master
Are you still seeing issues with the latest master?
Robert
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Tue, 21 Jul 2020 17:54:02 GMT)
Full text and
rfc822 format available.
Message #110 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Robert Pluim <rpluim <at> gmail.com> writes:
>>>>>> On Tue, 21 Jul 2020 08:40:56 -0400, Amin Bandali <bandali <at> gnu.org> said:
>
> Amin> Hi Eli, Pip,
> Amin> Thank you for working on this and fixing it on emacs-27 per the reports.
>
> Amin> I have not been able to keep a close eye on GNU lists and trackers for a
> Amin> while now, but I was wondering if there's been any more progress/updates
> Amin> on this issue on master? I sometimes read/write in Persian, and this
> Amin> bug is quite an annoyance. :-) Changing the text scale (C-x C-=) and/or
> Amin> filling the paragraph (M-q) seems to sometimes help, but not always; and
> Amin> the issue remains.
>
> emacs-27 gets merged to master on a regular basis, and the relevant
> commit is present in both:
>
> $ git branch --contains 30a7ee505a
> * emacs-27
> + master
>
> Are you still seeing issues with the latest master?
>
> Robert
>
Ah, great point! This got me suspicious, so I went ahead and actually
tried both emacs-27 and master in a few different configurations myself.
It seems that with emacs-27, Xft rendering of Arabic/Persian text works
just fine, like it used to: <https://p.bndl.org/persian-emacs-xft.png>.
However, using Cairo+HarfBuzz, on both emacs-27 and master the issue is
still present: <https://p.bndl.org/persian-emacs-ftcrhb.png>.
Excerpts from M-x describe-char RET on a Persian character from Xft and
Cairo+HarfBuzz are at <https://p.bndl.org/persian-emacs-xft.txt> and
<https://p.bndl.org/persian-emacs-ftcrhb.txt> respectively, in case that
might be useful somehow. The font used for typesetting the Persian text
is Vazir v22.1.0, available from
<https://github.com/rastikerdar/vazir-font/releases/download/v22.1.0/vazir-font-v22.1.0.zip>.
My .Xresources configuration for the Xft and Cairo+HarfBuzz setups, the
former only for emacs-27, and the latter for both emacs-27 and master:
Emacs.FontBackend: xft,x
Emacs.font: Source Code Pro Medium:size=14
Emacs.FontBackend: ftcrhb,x
Emacs.font: Source Code Pro Medium:size=14
Lastly, it might be worth mentioning that if I recall correctly, when
using xfthb with emacs-27, I observe the same issue. Which may suggest
that perhaps the issue is related to Emacs's HarfBuzz support.
Hope this is useful.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Tue, 21 Jul 2020 18:28:01 GMT)
Full text and
rfc822 format available.
Message #113 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Amin Bandali <bandali <at> gnu.org>
> Cc: Eli Zaretskii <eliz <at> gnu.org>, valizadeh.ho <at> gmail.com,
> 41005 <at> debbugs.gnu.org, Pip Cet <pipcet <at> gmail.com>
> Date: Tue, 21 Jul 2020 13:53:25 -0400
>
> Lastly, it might be worth mentioning that if I recall correctly, when
> using xfthb with emacs-27, I observe the same issue. Which may suggest
> that perhaps the issue is related to Emacs's HarfBuzz support.
But that's exactly the configuration that was fixed...
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Wed, 22 Jul 2020 02:13:02 GMT)
Full text and
rfc822 format available.
Message #116 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Amin Bandali <bandali <at> gnu.org>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>, valizadeh.ho <at> gmail.com,
>> 41005 <at> debbugs.gnu.org, Pip Cet <pipcet <at> gmail.com>
>> Date: Tue, 21 Jul 2020 13:53:25 -0400
>>
>> Lastly, it might be worth mentioning that if I recall correctly, when
>> using xfthb with emacs-27, I observe the same issue. Which may suggest
>> that perhaps the issue is related to Emacs's HarfBuzz support.
>
> But that's exactly the configuration that was fixed...
>
It is strange. I did some more testing. Whether with xfthb (emacs-27)
or ftcrhb (master), it seems like typing in Persian in *scratch* works
okay. However, if I paste (yank) Persian text, e.g. from Wikipedia,
into *scratch*, the issue surfaces and yanked text is garbled. Further,
Persian text in Gnus's article-mode and in message-mode is always
garbled to begin with. It does seem like a HarfBuzz issue to me.
Please feel free to grab the Vazir font and test for yourself.
Hossein, do you also get this behaviour with Emacs's HarfBuzz?
Any other users using Emacs for Persian or Arabic? I wonder if this
issue happens for other right-to-left languages, e.g. Hebrew.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Wed, 22 Jul 2020 14:21:02 GMT)
Full text and
rfc822 format available.
Message #119 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Amin Bandali <bandali <at> gnu.org>
> Cc: rpluim <at> gmail.com, valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org,
> pipcet <at> gmail.com
> Date: Tue, 21 Jul 2020 22:12:19 -0400
>
> >> Lastly, it might be worth mentioning that if I recall correctly, when
> >> using xfthb with emacs-27, I observe the same issue. Which may suggest
> >> that perhaps the issue is related to Emacs's HarfBuzz support.
> >
> > But that's exactly the configuration that was fixed...
> >
>
> It is strange. I did some more testing. Whether with xfthb (emacs-27)
> or ftcrhb (master), it seems like typing in Persian in *scratch* works
> okay. However, if I paste (yank) Persian text, e.g. from Wikipedia,
> into *scratch*, the issue surfaces and yanked text is garbled. Further,
> Persian text in Gnus's article-mode and in message-mode is always
> garbled to begin with. It does seem like a HarfBuzz issue to me.
> Please feel free to grab the Vazir font and test for yourself.
Does this happen only with that font, or does it happen with any font
that supports Persian?
If it's only that font, then I don't think we should try solving this
in Emacs; please report this to the font developers.
If the problem happens with any Persian-supporting font, then please
tell the details: which page you copy/paste from, what browser did you
use to copy that text, detailed steps for how to reproduce in Gnus,
etc. This bug report described a problem that happened in different
situation (the Emacs EWW browser), so it's hard to know what exactly
happens in your case.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 24 Jul 2020 04:12:01 GMT)
Full text and
rfc822 format available.
Message #122 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Amin Bandali <bandali <at> gnu.org>
>> Cc: rpluim <at> gmail.com, valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org,
>> pipcet <at> gmail.com
>> Date: Tue, 21 Jul 2020 22:12:19 -0400
>>
>> >> Lastly, it might be worth mentioning that if I recall correctly, when
>> >> using xfthb with emacs-27, I observe the same issue. Which may suggest
>> >> that perhaps the issue is related to Emacs's HarfBuzz support.
>> >
>> > But that's exactly the configuration that was fixed...
>> >
>>
>> It is strange. I did some more testing. Whether with xfthb (emacs-27)
>> or ftcrhb (master), it seems like typing in Persian in *scratch* works
>> okay. However, if I paste (yank) Persian text, e.g. from Wikipedia,
>> into *scratch*, the issue surfaces and yanked text is garbled. Further,
>> Persian text in Gnus's article-mode and in message-mode is always
>> garbled to begin with. It does seem like a HarfBuzz issue to me.
>> Please feel free to grab the Vazir font and test for yourself.
>
> Does this happen only with that font, or does it happen with any font
> that supports Persian?
>
> If it's only that font, then I don't think we should try solving this
> in Emacs; please report this to the font developers.
>
It appears to happen with any font supporting Persian/Arabic. Examples
include DejaVu Sans and Noto Sans Arabic.
>
> If the problem happens with any Persian-supporting font, then please
> tell the details: which page you copy/paste from, what browser did you
> use to copy that text, detailed steps for how to reproduce in Gnus,
> etc. This bug report described a problem that happened in different
> situation (the Emacs EWW browser), so it's hard to know what exactly
> happens in your case.
>
> Thanks.
>
Examples of pages I copied excerpts from include the front page of the
Persian Wikipedia <https://fa.wikipedia.org>, as well as the Persian
translation of GNU's homepage <https://www.gnu.org/home.fa.html>. The
issue does not seem to be specific to a particular text, and appears to
occur when pasting any Persian text into a buffer. As for the browser,
I tried with both GNU IceCat and Emacs's EWW; same results. I am able
to reproduce by pasting any random Persian text into any Emacs buffer.
In the case of Gnus, simply pressing RET on the subject of an email in
gnus-summary-mode to have it displayed using gnus-article-mode shows the
Persian text in the email body as garbled.
Hope this helps.
* * *
Actually, as I was about to hit send, it /just/ occurred to me to try
with -Q or -q, and in both cases I do not see this bug! How strange!
I'll start bisecting my Emacs configuration and will report back if I
manage to find the cause.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Fri, 24 Jul 2020 06:10:02 GMT)
Full text and
rfc822 format available.
Message #125 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Amin Bandali <bandali <at> gnu.org>
> Cc: rpluim <at> gmail.com, valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org,
> pipcet <at> gmail.com
> Date: Fri, 24 Jul 2020 00:11:30 -0400
>
> Actually, as I was about to hit send, it /just/ occurred to me to try
> with -Q or -q, and in both cases I do not see this bug! How strange!
> I'll start bisecting my Emacs configuration and will report back if I
> manage to find the cause.
Yes, that would be most useful.
If the result still points into some valid Emacs use pattern, please
describe in more detail a small part of the Wikipedia page that needs
to be copy/pasted, and the precise place where the result of pasting
is rendered incorrectly. Please keep in mind that for people who
don't read Persian it is hard to find those issues in vast amounts of
text, so any measure that makes this significantly easier will allow
debugging and finding solutions to be much more efficient.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 25 Jul 2020 04:20:02 GMT)
Full text and
rfc822 format available.
Message #128 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
[...]
>
> Yes, that would be most useful.
>
> If the result still points into some valid Emacs use pattern, please
> describe in more detail a small part of the Wikipedia page that needs
> to be copy/pasted, and the precise place where the result of pasting
> is rendered incorrectly. Please keep in mind that for people who
> don't read Persian it is hard to find those issues in vast amounts of
> text, so any measure that makes this significantly easier will allow
> debugging and finding solutions to be much more efficient.
>
> Thanks.
>
Having done some digging, I've found at least one reliable way to
reproduce the issue with -Q:
1. launch Emacs using `emacs -Q';
2. switch to *scratch* if not there already;
3. do M-x column-number-mode RET;
4. paste the following Persian text into *scratch*:
ویکیپدیا دانشنامهای اینترنتی با بیش از ۲۸۰ زبان با محتوای آزاد است که با همکاری افراد داوطلب نوشته میشود و هر کس که به اینترنت دسترسی داشته باشد میتواند مقالههای آن را ویرایش کند.
It is from the first sentence of the "دربارهٔ ویکیپدیا" (About
Wikipedia) section on <https://fa.wikipedia.org>.
The text should appear garbled. However, if you omit step 3, the
pasted text should appear just fine. It appears that if Persian text
is pasted before column-number-mode is enabled, even if the paste is
subsequently undone before enabling column-number-mode, the issue
does *not* surface.
The above works for pasting into a message-mode buffer as well. I have
not tried to find a recipe specific to gnus-article-mode yet.
Hope this helps.
P.S. if I did not mention earlier, all of this is on a Debian Buster
GNU/Linux system.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 25 Jul 2020 06:49:02 GMT)
Full text and
rfc822 format available.
Message #131 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Amin Bandali <bandali <at> gnu.org>
> Cc: rpluim <at> gmail.com, valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org,
> pipcet <at> gmail.com
> Date: Sat, 25 Jul 2020 00:19:35 -0400
>
> > If the result still points into some valid Emacs use pattern, please
> > describe in more detail a small part of the Wikipedia page that needs
> > to be copy/pasted, and the precise place where the result of pasting
> > is rendered incorrectly. Please keep in mind that for people who
> > don't read Persian it is hard to find those issues in vast amounts of
> > text, so any measure that makes this significantly easier will allow
> > debugging and finding solutions to be much more efficient.
> >
> > Thanks.
> >
>
> Having done some digging, I've found at least one reliable way to
> reproduce the issue with -Q:
>
> 1. launch Emacs using `emacs -Q';
> 2. switch to *scratch* if not there already;
> 3. do M-x column-number-mode RET;
> 4. paste the following Persian text into *scratch*:
> ویکیپدیا دانشنامهای اینترنتی با بیش از ۲۸۰ زبان با محتوای آزاد است که با همکاری افراد داوطلب نوشته میشود و هر کس که به اینترنت دسترسی داشته باشد میتواند مقالههای آن را ویرایش کند.
> It is from the first sentence of the "دربارهٔ ویکیپدیا" (About
> Wikipedia) section on <https://fa.wikipedia.org>.
>
> The text should appear garbled. However, if you omit step 3, the
> pasted text should appear just fine. It appears that if Persian text
> is pasted before column-number-mode is enabled, even if the paste is
> subsequently undone before enabling column-number-mode, the issue
> does *not* surface.
Please help me understand what exactly do you mean by "garbled". The
text you show is still quite long, and I cannot easily locate it in
that page (I don't see any "About Wikipedia" in the English version of
that page), nor do I understand what you mean by "garbled".
Would it be possible instead to paste only a very small portion of the
text, and tell exactly which part(s) of that short text are garbled,
and how they are garbled? Can you, for example, post a screenshot
showing exactly which part of the Wikipedia page should be copied, and
another screenshot of the garbled text in Emacs showing which part(s)
are displayed incorrectly? And please keep the pasted text as short
as possible, because locating the garbled part(s) in text I cannot
read which is displayed in a different font from what's in the
screenshot can be a very frustrating and error-prone experience.
Also, do you copy this from EWW or from some other Web browser?
(I tried to explain all of this in my previous message, but
unfortunately this reproduction recipe again presents the same
difficulties as I tried to avoid by explaining how to provide
information that would be easy to follow up.)
Please help me understand the problem, without that I see no way of
making any progress here.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 25 Jul 2020 15:54:02 GMT)
Full text and
rfc822 format available.
Message #134 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
[...]
>
> Please help me understand what exactly do you mean by "garbled". The
> text you show is still quite long, and I cannot easily locate it in
> that page (I don't see any "About Wikipedia" in the English version of
> that page), nor do I understand what you mean by "garbled".
>
Choosing the Persian Wikipedia was probably not the best idea, given how
busy their page is and how much material is on there, which does not
help at all when trying to find an excerpt of text in a language one
doesn't know.
Also, sorry for not being clear about what I mean by "garbled" text.
Please see <https://en.wikipedia.org/wiki/Persian_alphabet#Letters>.
In Persian alphabet, letters take different forms depending on where in
a word they appear. The "overview table" in the above page includes
examples of the possible contextual forms for each letter. The issue
described in this bug report is basically about Emacs using the wrong
contextual form of letters when rendering Persian text.
>
> Would it be possible instead to paste only a very small portion of the
> text, and tell exactly which part(s) of that short text are garbled,
> and how they are garbled? Can you, for example, post a screenshot
> showing exactly which part of the Wikipedia page should be copied, and
> another screenshot of the garbled text in Emacs showing which part(s)
> are displayed incorrectly? And please keep the pasted text as short
> as possible, because locating the garbled part(s) in text I cannot
> read which is displayed in a different font from what's in the
> screenshot can be a very frustrating and error-prone experience.
>
> Also, do you copy this from EWW or from some other Web browser?
>
Certainly. Instead of Persian Wikipedia, let's use the Persian
translation of the GNU homepage: <https://www.gnu.org/home.fa.html>;
specifically, the first part of the first sentence of the first
paragraph (up to and including the semicolon):
گنو یک سیستمعامل بر مبنای نرمافزار آزاد است؛
I have created a very short video screencast of me walking through
reproducing the issue, by opening <https://www.gnu.org/home.fa.html> in
Debian Buster's firefox-esr (68.10.0esr (64-bit)), copying the above
excerpt from the page, and pasting into an Emacs *scratch* using C-y:
https://p.bndl.org/emacs-persian-wrong-contextual-forms.webm
The two open Emacs instances were both launched with "emacs -Q" and are
identical; except for the second one, I did M-x column-number-mode RET
before pasting the Persian text.
I also meant to include the following in my video in case they might be
useful, but forgot to.
,----[ M-x emacs-version RET ]
| GNU Emacs 28.0.50 (build 8, x86_64-pc-linux-gnu, X toolkit, cairo
| version 1.16.0, Xaw3d scroll bars) of 2020-07-20
`----
,----[ C-h v system-configuration-options RET ]
| "--with-modules --without-gconf --without-gsettings
| --with-x-toolkit=lucid --with-xft --with-xaw3d --without-gpm
| --with-imagemagick --with-harfbuzz --prefix=/data/bandali/usr/local"
`----
>
> (I tried to explain all of this in my previous message, but
> unfortunately this reproduction recipe again presents the same
> difficulties as I tried to avoid by explaining how to provide
> information that would be easy to follow up.)
>
> Please help me understand the problem, without that I see no way of
> making any progress here.
>
> Thanks.
>
I really am trying. :-) Thank you for baring with me here, and for
trying to help find the issue, Eli; I appreciate it.
Hope this helps.
Thanks.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 25 Jul 2020 16:29:01 GMT)
Full text and
rfc822 format available.
Message #137 received at 41005 <at> debbugs.gnu.org (full text, mbox):
> From: Amin Bandali <bandali <at> gnu.org>
> Cc: rpluim <at> gmail.com, valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org,
> pipcet <at> gmail.com
> Date: Sat, 25 Jul 2020 11:53:19 -0400
>
> > Also, do you copy this from EWW or from some other Web browser?
> >
>
> Certainly. Instead of Persian Wikipedia, let's use the Persian
> translation of the GNU homepage: <https://www.gnu.org/home.fa.html>;
> specifically, the first part of the first sentence of the first
> paragraph (up to and including the semicolon):
>
> گنو یک سیستمعامل بر مبنای نرمافزار آزاد است؛
>
> I have created a very short video screencast of me walking through
> reproducing the issue, by opening <https://www.gnu.org/home.fa.html> in
> Debian Buster's firefox-esr (68.10.0esr (64-bit)), copying the above
> excerpt from the page, and pasting into an Emacs *scratch* using C-y:
>
> https://p.bndl.org/emacs-persian-wrong-contextual-forms.webm
>
> The two open Emacs instances were both launched with "emacs -Q" and are
> identical; except for the second one, I did M-x column-number-mode RET
> before pasting the Persian text.
OK, thanks. Now everything is clear: at the time we discussed this,
almost 2 months ago, Pip Cet proposed a patch, which I asked him to
install. But it turns out it was never actually installed, and so
this particular situation was indeed not fixed.
I have now installed that patch on the emacs-27 branch, and the
problem is gone.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#41005
; Package
emacs
.
(Sat, 25 Jul 2020 16:45:01 GMT)
Full text and
rfc822 format available.
Message #140 received at 41005 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
[...]
>
> OK, thanks. Now everything is clear: at the time we discussed this,
> almost 2 months ago, Pip Cet proposed a patch, which I asked him to
> install. But it turns out it was never actually installed, and so
> this particular situation was indeed not fixed.
>
Oh, I see!
>
> I have now installed that patch on the emacs-27 branch, and the
> problem is gone.
>
I tried cherry picking them onto master in my local checkout, and they
do seem to fix the issue! Looking forward to them eventually getting
properly merged into master using the script. Thanks again for your
help and patience, Eli.
[signature.asc (application/pgp-signature, inline)]
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Sat, 25 Jul 2020 16:57:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
hossein valizadeh <valizadeh.ho <at> gmail.com>
:
bug acknowledged by developer.
(Sat, 25 Jul 2020 16:57:02 GMT)
Full text and
rfc822 format available.
Message #145 received at 41005-done <at> debbugs.gnu.org (full text, mbox):
> From: Amin Bandali <bandali <at> gnu.org>
> Cc: rpluim <at> gmail.com, valizadeh.ho <at> gmail.com, 41005 <at> debbugs.gnu.org,
> pipcet <at> gmail.com
> Date: Sat, 25 Jul 2020 12:44:36 -0400
>
> I tried cherry picking them onto master in my local checkout, and they
> do seem to fix the issue! Looking forward to them eventually getting
> properly merged into master using the script. Thanks again for your
> help and patience, Eli.
Thanks, so I think we can finally close this bug.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 23 Aug 2020 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 303 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.