GNU bug report logs -
#68254
EWW ‘readable’ by default
Previous Next
Reported by: Navajeeth <yvv0 <at> proton.me>
Date: Fri, 5 Jan 2024 07:37:02 UTC
Severity: minor
Done: Jim Porter <jporterbugs <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 68254 in the body.
You can then email your comments to 68254 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Fri, 05 Jan 2024 07:37:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Navajeeth <yvv0 <at> proton.me>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 05 Jan 2024 07:37:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I nearly always prefer reading webpages in EWW after running the eww-readable command. Can it be possible to have EWW open webpages in the ‘readable’ view by default, but let you display the full (pre–eww-readable) render of a webpage with a command? I.e. have an inverse of the current setup, where you have to manually toggle the readable view.
—Navajeeth
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Fri, 05 Jan 2024 11:54:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 68254 <at> debbugs.gnu.org (full text, mbox):
> Date: Fri, 05 Jan 2024 07:35:56 +0000
> From: Navajeeth via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>
> I nearly always prefer reading webpages in EWW after running the eww-readable command. Can
> it be possible to have EWW open webpages in the ‘readable’ view by default, but let you display the
> full (pre–eww-readable) render of a webpage with a command? I.e. have an inverse of the current
> setup, where you have to manually toggle the readable view.
Did you try to add 'eww-readable' to eww-after-render-hook?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Fri, 05 Jan 2024 13:37:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 68254 <at> debbugs.gnu.org (full text, mbox):
[Please use Reply All to reply, so that the bug tracker is CC'ed.]
> Date: Fri, 05 Jan 2024 12:08:29 +0000
> From: Navajeeth <yvv0 <at> proton.me>
>
> I’ve tried that method. While at first it appears to work how I want, it’s sub-optimal because it clutters
> your history with two version of every webpage you open: one the full non-readable version and then
> the readable version generated by the after-render-hook. Going back in the history is a chore,
> you need to press ‘l’ twice to go back one webpage.
>
> I used to tolerate it for a while, but now I feel that there could be a better way.
Severity set to 'wishlist' from 'normal'
Request was from
Stefan Kangas <stefankangas <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 10 Jan 2024 17:30:02 GMT)
Full text and
rfc822 format available.
Severity set to 'minor' from 'wishlist'
Request was from
Stefan Kangas <stefankangas <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 30 Jan 2024 00:46:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Sun, 17 Mar 2024 19:27:01 GMT)
Full text and
rfc822 format available.
Message #18 received at 68254 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 1/5/2024 5:35 AM, Eli Zaretskii wrote:
> [Please use Reply All to reply, so that the bug tracker is CC'ed.]
>
>> Date: Fri, 05 Jan 2024 12:08:29 +0000
>> From: Navajeeth <yvv0 <at> proton.me>
>>
>> I’ve tried that method. While at first it appears to work how I want, it’s sub-optimal because it clutters
>> your history with two version of every webpage you open: one the full non-readable version and then
>> the readable version generated by the after-render-hook. Going back in the history is a chore,
>> you need to press ‘l’ twice to go back one webpage.
>>
>> I used to tolerate it for a while, but now I feel that there could be a better way.
Here's a patch for this. It turns 'eww-readable' into a toggle (using
the same semantics as minor modes), and also adds an option to prevent
adding a new history entry for each call.
After this patch, you could set 'eww-readable-adds-to-history' to nil
and add 'eww-readable' to 'eww-after-render-hook', and then everything
should work ok. With those settings, you could then call 'eww-readable'
to display the full page if needed.
(There might be some value in adding another new option that lets you
specify a list of regexps to match pages that should start in readable
mode; then it would be easy for users to enable that for
"https://example\.com/.*" or similar. We can do that later if there's
any demand for it, though.)
[0001-Allow-toggling-readable-mode-in-EWW.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Mon, 18 Mar 2024 04:34:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 68254 <at> debbugs.gnu.org (full text, mbox):
Hi all,
I'm not sure it would be a good idea to enable eww-readable by default.
IME eww-readable is not reliably effective enough to be used by default.
I think that if it were, too many users would find that EWW would
produce unusable results by default, and they'd likely blame EWW itself
rather than eww-readable, being unaware that eww-readable were even
involved.
I wish this weren't the case, but the modern Web is too, er, modern, I'm
afraid.
I like Jim's idea of having an option of URL-matching regexps that
automatically activate eww-readable. That does sound useful.
My two cents.
--Adam
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Mon, 18 Mar 2024 05:19:01 GMT)
Full text and
rfc822 format available.
Message #24 received at 68254 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thank you so much for the patch, @Jim! I dunno how to apply patches, but I’ll learn and try yours out as soon as I can.
Having regexps to match to turn readability on is a good start. I hope there will be a more convenient way to do that, other than having to manually add to that list in your init.el; maybe a function that proactively asks you, when you apply readability, if you’d like to add it to that list with a ‘y or n’.
Albeit I find myself opening a lot of small blogs and personal websites in EWW. A lot of different domain names. Both a function that asks to automatically add to a readability-on list and manually adding to that list sound like a hassle.
I think a better way to go would be to have a readability-off list for the readability-minor-mode. In my experience, with the kind of sites I open with EWW (textual sites without a lot of graphics or JavaScript), the list of ones where ‘eww-readable’ doesn’t work is a lot smaller than the ones where it does.
But I agree with @Adam that readability shouldn’t be on as the default behaviour. I gave this thread a bad subject line. I meant in the sense: I wanted the option to turn it on and replace the default behaviour for me, because I was finding that most of the sites I was opening with EWW were working better with readability. And perhaps have it as an option for everyone to turn on.
—Navajeeth
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Mon, 18 Mar 2024 05:50:02 GMT)
Full text and
rfc822 format available.
Message #27 received at 68254 <at> debbugs.gnu.org (full text, mbox):
On 3/17/2024 9:32 PM, Adam Porter wrote:
> I'm not sure it would be a good idea to enable eww-readable by default.
> IME eww-readable is not reliably effective enough to be used by default.
> I think that if it were, too many users would find that EWW would
> produce unusable results by default, and they'd likely blame EWW itself
> rather than eww-readable, being unaware that eww-readable were even
> involved.
I agree overall. It's hard to know for sure if a web page will look ok
in readable mode without trying it first.
That's why I opted to keep the default behavior unchanged in my patch.
It just makes it possible to add 'eww-readable' to
'eww-after-render-hook' without producing duplicate history entries.
That way, if most of the pages you visit *are* readable, you can set it
up like that and still get to the full view by calling 'eww-readable' again.
> I like Jim's idea of having an option of URL-matching regexps that
> automatically activate eww-readable. That does sound useful.
Yeah, I think I might add that in, since 1) I'd find it useful, 2) it
should be easy, and 3) the Safari browser already supports this, so
there's already precedent elsewhere. (It's arguably even more relevant
for EWW than Safari, since many webpages are a real mess in EWW without
readable-mode.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Mon, 18 Mar 2024 05:53:02 GMT)
Full text and
rfc822 format available.
Message #30 received at 68254 <at> debbugs.gnu.org (full text, mbox):
On 3/17/2024 10:17 PM, Navajeeth via Bug reports for GNU Emacs, the
Swiss army knife of text editors wrote:
> Thank you so much for the patch, @Jim! I dunno how to apply patches, but
> I’ll learn and try yours out as soon as I can.
Even though I contribute to Emacs, I tend to use the latest proper
release as my daily editor (so I'm using 29.2 now).[1] If I want to use
a patch I wrote on an older release, I take the new version of all the
relevant functions and then override the old ones using 'advice-add':
(defun updated-eww-readable (&optional arg)
;; new implementation here
)
(advice-add 'eww-readable :override 'updated-eww-readable)
If the patch has merged to the master branch, I usually wrap that with
'(when (< emacs-major-version 30) ...)' so that it doesn't do anything
on the master builds, and also so I know to remove it when 30.1 comes
out and I prune my init.el.
> I think a better way to go would be to have a /readability-off/ list for
> the readability-minor-mode. In my experience, with the kind of sites I
> open with EWW (textual sites without a lot of graphics or JavaScript),
> the list of ones where ‘eww-readable’ doesn’t work is a lot smaller than
> the ones where it does.
I was thinking about doing something like this. The list of regexps
could include a way to say both "if this regexp matches, use readable
mode" and "if this regexp matches, *don't* use readable mode". Then you
could make the list look something like this:
'(("^https://example.com/" . not-readable)
".*")
That would make every page except those from https://example.com use
readable mode. I think that would be the most flexible for complex
cases, while still being simple for the common case (a list of "plain"
regexps for readable-mode pages). It would also make it easy to have
most of a site (except for one section) use readable-mode.
[1] Mainly I just want to avoid having to worry about updating Emacs
master and then ending up with a broken Emacs. Murphy's Law dictates
that that will always occur at the worst possible time.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Mon, 18 Mar 2024 12:39:01 GMT)
Full text and
rfc822 format available.
Message #33 received at 68254 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 17 Mar 2024 12:24:26 -0700
> Cc: 68254 <at> debbugs.gnu.org
> From: Jim Porter <jporterbugs <at> gmail.com>
>
> Here's a patch for this. It turns 'eww-readable' into a toggle (using
> the same semantics as minor modes), and also adds an option to prevent
> adding a new history entry for each call.
Thanks.
> +When called interactively, this command toggles the display of the
> +readable parts. With a positive prefix argument, always display the
> +readable parts, and with a zero or negative prefix, display the full
> +page.
The imperative form ("display") is what we use in the doc strings, but
it is not really appropriate for the manual. Here we say "the
function displays" or "it displays" instead, which is consistent with
the first sentence in the above paragraph.
> +(defun eww-readable (&optional arg)
> + "Toggle display of only the main \"readable\" parts of the current web page.
> This command uses heuristics to find the parts of the web page that
> -contains the main textual portion, leaving out navigation menus and
> -the like."
> - (interactive nil eww-mode)
> +contains the main textual portion, leaving out navigation menus and the
"contain" (since it refers to "parts", in plural).
> +If called interactively, toggle the display of the readable parts. If
> +the prefix argument is positive, display the readable parts, and if it
> +is zero or negative, display the full page.
> +
> +If called from Lisp, toggle the display of the readable parts if ARG is
> +`toggle'. Display the readable parts if ARG is nil, omitted, or is a
> +positive number. Display the full page if ARG is a negative number."
This doc string should mention eww-readable-adds-to-history.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Tue, 19 Mar 2024 00:03:01 GMT)
Full text and
rfc822 format available.
Message #36 received at 68254 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 3/18/2024 5:37 AM, Eli Zaretskii wrote:
>> Date: Sun, 17 Mar 2024 12:24:26 -0700
>> Cc: 68254 <at> debbugs.gnu.org
>> From: Jim Porter <jporterbugs <at> gmail.com>
>>
>> Here's a patch for this. It turns 'eww-readable' into a toggle (using
>> the same semantics as minor modes), and also adds an option to prevent
>> adding a new history entry for each call.
>
> Thanks.
Thanks for looking. I've addressed all of your comments, and made some
more extensive changes to the implementation. I split up some of the
logic in the first patch so that it's easier to reuse without error, and
then added 'eww-readable-urls' in the second.
Because of how much I changed, I'd like to add some regression tests to
make sure everything still works correctly, but otherwise these patches
should be ready to go.
[0001-Allow-toggling-readable-mode-in-EWW.patch (text/plain, attachment)]
[0002-Add-eww-readable-urls.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Thu, 21 Mar 2024 10:53:01 GMT)
Full text and
rfc822 format available.
Message #39 received at 68254 <at> debbugs.gnu.org (full text, mbox):
> Date: Mon, 18 Mar 2024 17:00:33 -0700
> Cc: 68254 <at> debbugs.gnu.org, yvv0 <at> proton.me
> From: Jim Porter <jporterbugs <at> gmail.com>
>
> Thanks for looking. I've addressed all of your comments, and made some
> more extensive changes to the implementation. I split up some of the
> logic in the first patch so that it's easier to reuse without error, and
> then added 'eww-readable-urls' in the second.
Thanks, I have some minor nits below.
> Because of how much I changed, I'd like to add some regression tests to
> make sure everything still works correctly, but otherwise these patches
> should be ready to go.
Yes, tests would be good.
> ++++
> +*** 'eww-readable' now toggles display of the readable parts of a web page.
> +When called interactively, 'eww-readable' toggles whether to display
> +only the readable parts of a page or the full page. With a positive
> +prefix argument, always display the readable parts, and with a zero or
> +negative prefix, always display the full page.
You say "toggles", but then "display". It is better to make the style
consistent.
> +(defun eww--parse-html-region (start end &optional encode)
> + "Parse the HTML between START and END, returning the DOM as an S-expression.
> +Use ENCODE to decode the region; if nil, decode as UTF-8.
It is better to call the argument DECODE, not ENCODE.
> +@vindex eww-readable-urls
> + If you want EWW to render a certain page in ``readable'' mode by
> +default, you can add a regular expression matching its URL to
> +@code{eww-readable-urls}. Each entry can either be a regular expression
> +as a string or a cons cell of the form @code{(@var{regexp}
> +. @var{readability})}. If @var{readability} is non-@code{nil}, this
^^
Please use @w to prevent breaking long expressions between two lines.
Also, please leave two spaces between sentences.
> +(defcustom eww-readable-urls nil
> + "A list of regexps matching URLs to display in readable mode by default.
> +Each element can be either a string regexp or a cons cell of the
> +form (REGEXP . READABILITY). If READABILITY is non-nil, this behaves
> +the same as the string form; otherwise, URLs matching REGEXP will never
^^^^^^^^^^^^^^^^^^^^^^^^^^^
What do you mean by "the same as the string form"? which string form?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Fri, 22 Mar 2024 05:49:01 GMT)
Full text and
rfc822 format available.
Message #42 received at 68254 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 3/21/2024 3:51 AM, Eli Zaretskii wrote:
> Yes, tests would be good.
I've now added tests. (Good thing too, since I found a minor bug while
writing them!)
>> ++++
>> +*** 'eww-readable' now toggles display of the readable parts of a web page.
>> +When called interactively, 'eww-readable' toggles whether to display
>> +only the readable parts of a page or the full page. With a positive
>> +prefix argument, always display the readable parts, and with a zero or
>> +negative prefix, always display the full page.
>
> You say "toggles", but then "display". It is better to make the style
> consistent.
Fixed.
>> +(defun eww--parse-html-region (start end &optional encode)
>> + "Parse the HTML between START and END, returning the DOM as an S-expression.
>> +Use ENCODE to decode the region; if nil, decode as UTF-8.
>
> It is better to call the argument DECODE, not ENCODE.
I changed this to CODING-SYSTEM, since that's what
'decode-coding-region' calls the argument.
>> +@vindex eww-readable-urls
>> + If you want EWW to render a certain page in ``readable'' mode by
>> +default, you can add a regular expression matching its URL to
>> +@code{eww-readable-urls}. Each entry can either be a regular expression
>> +as a string or a cons cell of the form @code{(@var{regexp}
>> +. @var{readability})}. If @var{readability} is non-@code{nil}, this
> ^^
> Please use @w to prevent breaking long expressions between two lines.
> Also, please leave two spaces between sentences.
Thanks, both fixed. I never knew about @w.
>> +(defcustom eww-readable-urls nil
>> + "A list of regexps matching URLs to display in readable mode by default.
>> +Each element can be either a string regexp or a cons cell of the
>> +form (REGEXP . READABILITY). If READABILITY is non-nil, this behaves
>> +the same as the string form; otherwise, URLs matching REGEXP will never
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> What do you mean by "the same as the string form"? which string form?
I've tried to clarify this. By "string form", I meant the "string
regexp" mentioned previously; in my new patch, I describe that as "a
regular expression in string form".
[0001-Allow-toggling-readable-mode-in-EWW.patch (text/plain, attachment)]
[0002-Add-eww-readable-urls.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68254
; Package
emacs
.
(Sat, 23 Mar 2024 07:50:01 GMT)
Full text and
rfc822 format available.
Message #45 received at 68254 <at> debbugs.gnu.org (full text, mbox):
> Date: Thu, 21 Mar 2024 22:46:50 -0700
> Cc: 68254 <at> debbugs.gnu.org, yvv0 <at> proton.me
> From: Jim Porter <jporterbugs <at> gmail.com>
>
> I've tried to clarify this. By "string form", I meant the "string
> regexp" mentioned previously; in my new patch, I describe that as "a
> regular expression in string form".
Thanks. The updated changeset LGTM, with a single minor comment:
> +(defcustom eww-readable-urls nil
> + "A list of regexps matching URLs to display in readable mode by default.
> +Each element can be one of the following forms: a regular expression in
> +string form or a cons cell of the form (REGEXP . READABILITY). If
> +READABILITY is non-nil, this behaves the same as the string form;
> +otherwise, URLs matching REGEXP will never be displayed in readable mode
> +by default."
The doc string of this user option should explain what is the
"readable mode", or at least have a hyper-link to eww-readable (which
does explain that). Users who read this doc string should understand
what that mode does, and (unlike in the manual) there's no prior
context to rely upon.
Reply sent
to
Jim Porter <jporterbugs <at> gmail.com>
:
You have taken responsibility.
(Sat, 23 Mar 2024 17:28:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Navajeeth <yvv0 <at> proton.me>
:
bug acknowledged by developer.
(Sat, 23 Mar 2024 17:28:01 GMT)
Full text and
rfc822 format available.
Message #50 received at 68254-done <at> debbugs.gnu.org (full text, mbox):
On 3/23/2024 12:48 AM, Eli Zaretskii wrote:
> The doc string of this user option should explain what is the
> "readable mode", or at least have a hyper-link to eww-readable (which
> does explain that). Users who read this doc string should understand
> what that mode does, and (unlike in the manual) there's no prior
> context to rely upon.
Good point. I added the following to the docstring: "EWW will display
matching URLs using `eww-readable' (which see)." I also merged this to
the master branch as 4b0f5cdb01f, so closing this bug.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 21 Apr 2024 11:24:15 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 137 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.