GNU bug report logs - #54657
29.0.50; 100% CPU usage with eww on https://blogsurf.io/

Previous Next

Package: emacs;

Reported by: dal-blazej <at> onenetbeyond.org

Date: Thu, 31 Mar 2022 19:50:01 UTC

Severity: normal

Tags: moreinfo

Found in version 29.0.50

Done: dal-blazej <at> onenetbeyond.org

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 54657 in the body.
You can then email your comments to 54657 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Thu, 31 Mar 2022 19:50:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to dal-blazej <at> onenetbeyond.org:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 31 Mar 2022 19:50:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: dal-blazej <at> onenetbeyond.org
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; 100% CPU usage with eww on https://blogsurf.io/
Date: Thu, 31 Mar 2022 21:49:40 +0200
Hi,

I was surprised to see that particular site with eww leads to 100% CPU
usage for 2/3 minutes.

See profiler output below :

---------- CPU
     83,835,054  84% - url-http-generic-filter
     83,687,707  84%  - url-http-content-length-after-change-function
     82,024,107  82%   - url-http-activate-callback
     82,022,955  82%    - eww-render
     55,943,666  56%     - eww-display-html
     30,476,531  30%      - funcall-with-delayed-message
     30,476,531  30%       + #<compiled 0x6f61fe78c524dc6>
            704   0%        plist-put
             21   0%        url-generic-parse-url
     13,373,648  13%     + eww--after-page-change
          4,255   0%     + mail-header-parse-content-type
          1,098   0%     + url-generic-parse-url
          1,056   0%     + set-buffer-file-coding-system
      1,370,741   1%     file-size-human-readable-iec
          5,088   0%   + url-http-parse-headers
          8,360   0%  + url-http-wait-for-headers-change-function
     14,811,644  14% + command-execute
        186,472   0% + redisplay_internal (C function)
         26,027   0% + url-http-async-sentinel
         22,984   0% + timer-event-handler
         21,350   0% + nsm-verify-connection
            232   0% + gui-set-selection
            232   0% + deactivate-mark
             24   0% + eldoc-schedule-timer
              0   0%   ...
----------

---------- RAM
       50462  98% - url-http-generic-filter
       50458  98%  - url-http-content-length-after-change-function
       50366  98%   - url-http-activate-callback
       50366  98%    - eww-render
       50357  97%     - eww-display-html
          89   0%      + funcall-with-delayed-message
           1   0%     + eww--after-page-change
          74   0%     file-size-human-readable-iec
         689   1% + ...
         230   0% + command-execute
           8   0% + redisplay_internal (C function)
           2   0% + timer-event-handler
           1   0% + nsm-verify-connection
----------

Usually I found eww not too CPU hungry so I thought you may be
interested by that report.



In GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0)
 of 2022-03-31 built on localhost
Repository revision: 948181df9cbdcc8845fc3662e2007d8e09f48c71
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Debian GNU/Linux 11 (bullseye)

Configured using:
 'configure --build x86_64-linux-gnu --prefix=/opt/emacs
 --with-mailutils --with-sound=yes --without-gconf --without-gsettings
 --with-x=yes --without-toolkit-scroll-bars --with-x-toolkit=gtk3
 --with-json --with-native-compilation --with-xwidgets
 build_alias=x86_64-linux-gnu 'CFLAGS=-O2 -Wall ''

Configured features:
CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS HARFBUZZ JPEG JSON LIBSELINUX
LIBXML2 MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER PNG SECCOMP SOUND
SQLITE3 THREADS TIFF X11 XDBE XIM XPM XWIDGETS GTK3 ZLIB

Important settings:
  value of $LANG: en_US
  locale-coding-system: utf-8

Major mode: Summary

Load-path shadows:
/home/user/.emacs.d/elpa/transient-0.3.7/transient hides /opt/emacs/share/emacs/29.0.50/lisp/transient




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Fri, 01 Apr 2022 06:05:02 GMT) Full text and rfc822 format available.

Message #8 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dal-blazej <at> onenetbeyond.org
Cc: 54657 <at> debbugs.gnu.org
Subject: Re: bug#54657: 29.0.50;
 100% CPU usage with eww on https://blogsurf.io/
Date: Fri, 01 Apr 2022 09:05:02 +0300
> Date: Thu, 31 Mar 2022 21:49:40 +0200
> From: dal-blazej--- via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> I was surprised to see that particular site with eww leads to 100% CPU
> usage for 2/3 minutes.
> 
> See profiler output below :
> 
> ---------- CPU
>      83,835,054  84% - url-http-generic-filter
>      83,687,707  84%  - url-http-content-length-after-change-function
>      82,024,107  82%   - url-http-activate-callback
>      82,022,955  82%    - eww-render
>      55,943,666  56%     - eww-display-html
>      30,476,531  30%      - funcall-with-delayed-message
>      30,476,531  30%       + #<compiled 0x6f61fe78c524dc6>

Type 'v' to see the page's source, and you will immediately understand
why.  99% of that page is a huge JS script that is basically a single
humongous line whose length is 12MB.  Even Less chokes on this page,
just showing its end on the screen.

IOW, IMO this is just another instance of the well-known problem in
the display engine with very long lines.

Maybe EWW could be smarter when displaying pages with JS scripts, but
how many pages out there have something similar?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sat, 02 Apr 2022 15:34:02 GMT) Full text and rfc822 format available.

Message #11 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: dal-blazej <at> onenetbeyond.org
Cc: 54657 <at> debbugs.gnu.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sat, 02 Apr 2022 17:33:22 +0200
dal-blazej <at> onenetbeyond.org writes:

> I was surprised to see that particular site with eww leads to 100% CPU
> usage for 2/3 minutes.

I'm unable to reproduce the problem -- rendering that site is nearly
instantaneous for me.  But perhaps it's serving out different HTML to
you?  Try hitting `v', save the HTML somewhere, and post the resulting
URL, and I can take a look.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sat, 02 Apr 2022 15:34:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sat, 02 Apr 2022 15:36:01 GMT) Full text and rfc822 format available.

Message #16 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sat, 02 Apr 2022 17:35:03 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> Type 'v' to see the page's source, and you will immediately understand
> why.  99% of that page is a huge JS script that is basically a single
> humongous line whose length is 12MB.

eww doesn't render <script> things, so it doesn't matter how long they
are for eww.  (It does matter for libxml-parse-html-region, but it
should be able to parse 12MB of <script> in a few milliseconds.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sat, 02 Apr 2022 15:46:01 GMT) Full text and rfc822 format available.

Message #19 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sat, 02 Apr 2022 18:45:20 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: dal-blazej <at> onenetbeyond.org,  54657 <at> debbugs.gnu.org
> Date: Sat, 02 Apr 2022 17:35:03 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Type 'v' to see the page's source, and you will immediately understand
> > why.  99% of that page is a huge JS script that is basically a single
> > humongous line whose length is 12MB.
> 
> eww doesn't render <script> things, so it doesn't matter how long they
> are for eww.

What do you mean by "doesn't render"?  Does it traverse it?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sat, 02 Apr 2022 16:05:02 GMT) Full text and rfc822 format available.

Message #22 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sat, 02 Apr 2022 18:04:38 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> What do you mean by "doesn't render"?  Does it traverse it?

Yes.  A <script> is a node containing one string (which is ignored).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sat, 02 Apr 2022 16:19:02 GMT) Full text and rfc822 format available.

Message #25 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sat, 02 Apr 2022 19:18:12 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: dal-blazej <at> onenetbeyond.org,  54657 <at> debbugs.gnu.org
> Date: Sat, 02 Apr 2022 18:04:38 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > What do you mean by "doesn't render"?  Does it traverse it?
> 
> Yes.  A <script> is a node containing one string (which is ignored).

And is shr-fill-lines involved in this in any way, shape or form?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sat, 02 Apr 2022 16:30:02 GMT) Full text and rfc822 format available.

Message #28 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sat, 02 Apr 2022 18:29:02 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> And is shr-fill-lines involved in this in any way, shape or form?

The contents of the <script> node are ignored, so no.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sat, 02 Apr 2022 17:18:02 GMT) Full text and rfc822 format available.

Message #31 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sat, 02 Apr 2022 20:17:38 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: dal-blazej <at> onenetbeyond.org,  54657 <at> debbugs.gnu.org
> Date: Sat, 02 Apr 2022 17:35:03 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Type 'v' to see the page's source, and you will immediately understand
> > why.  99% of that page is a huge JS script that is basically a single
> > humongous line whose length is 12MB.
> 
> eww doesn't render <script> things, so it doesn't matter how long they
> are for eww.  (It does matter for libxml-parse-html-region, but it
> should be able to parse 12MB of <script> in a few milliseconds.)

According to what I see here, libxml-parse-html-region is indeed the
part that takes most of the CPU time, and I measured about 30 sec here
it took it to parse that page.  If you see something very different,
maybe the important factor here is the version of libxml2?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sat, 02 Apr 2022 23:56:01 GMT) Full text and rfc822 format available.

Message #34 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: dal-blazej <at> onenetbeyond.org
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 01:55:10 +0200
Maybe it's a question of horse power ?
I run emacs with 2 cpu and 4 gig of ram within a virtual machine

Still cpu 100% for more than 3 min from master -Q to accede to the
page. I cannot see the source from within eww : it will wait for 2~3 min
... then 3 min of CPU spike and finally only show the rendered html
... no source ! Tried 3 times.

I can see the source from firefox and yes, that's ... yikes.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 00:01:02 GMT) Full text and rfc822 format available.

Message #37 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: dal-blazej <at> onenetbeyond.org
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 01:59:26 +0200
[Message part 1 (text/plain, inline)]
See the attached archive.

[blogsurf.tar (application/x-tar, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 11:52:01 GMT) Full text and rfc822 format available.

Message #40 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: dal-blazej <at> onenetbeyond.org
Cc: 54657 <at> debbugs.gnu.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 13:51:30 +0200
dal-blazej <at> onenetbeyond.org writes:

> See the attached archive.

Something went wrong when you saved the source -- it's just a fragment
of the page, and it's encoded as...  I don't even know what to call it.
"Meta-HTML"?  That is, every < is encoded as &lt; etc, and then put into
a number of <span> statements.

What did you use to get that?  It's uniquely weird.  :-)

Instead, use the `v' command in eww to get the source buffer, and save
that instead.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 11:53:01 GMT) Full text and rfc822 format available.

Message #43 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 13:52:42 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> According to what I see here, libxml-parse-html-region is indeed the
> part that takes most of the CPU time, and I measured about 30 sec here
> it took it to parse that page.  If you see something very different,
> maybe the important factor here is the version of libxml2?

Wow.  If you call

(benchmark-run 1 (libxml-parse-html-region (point-min) (point-max)))

in the 12MB source buffer for that page, it reports 30 seconds?  It
reports 0.01 seconds for me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 12:07:01 GMT) Full text and rfc822 format available.

Message #46 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 14:06:43 +0200
On Apr 03 2022, Lars Ingebrigtsen wrote:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>> According to what I see here, libxml-parse-html-region is indeed the
>> part that takes most of the CPU time, and I measured about 30 sec here
>> it took it to parse that page.  If you see something very different,
>> maybe the important factor here is the version of libxml2?
>
> Wow.  If you call
>
> (benchmark-run 1 (libxml-parse-html-region (point-min) (point-max)))
>
> in the 12MB source buffer for that page, it reports 30 seconds?  It
> reports 0.01 seconds for me.

I'm getting 46 seconds.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 12:09:02 GMT) Full text and rfc822 format available.

Message #49 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 15:07:53 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: dal-blazej <at> onenetbeyond.org,  54657 <at> debbugs.gnu.org
> Date: Sun, 03 Apr 2022 13:52:42 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > According to what I see here, libxml-parse-html-region is indeed the
> > part that takes most of the CPU time, and I measured about 30 sec here
> > it took it to parse that page.  If you see something very different,
> > maybe the important factor here is the version of libxml2?
> 
> Wow.  If you call
> 
> (benchmark-run 1 (libxml-parse-html-region (point-min) (point-max)))
> 
> in the 12MB source buffer for that page, it reports 30 seconds?  It
> reports 0.01 seconds for me.

I didn't do the above, I just stepped through eww-display-html in
Edebug, and looked at my watch (30 sec is easy to measure without any
instruments).

So what is your version of libxml2?  Maybe that is the important
aspect here.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 12:22:01 GMT) Full text and rfc822 format available.

Message #52 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: larsi <at> gnus.org
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50;
 100% CPU usage with eww on https://blogsurf.io/
Date: Sun, 03 Apr 2022 15:21:14 +0300
> Resent-From: Eli Zaretskii <eliz <at> gnu.org>
> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
> Resent-CC: bug-gnu-emacs <at> gnu.org
> Resent-Sender: help-debbugs <at> gnu.org
> Date: Sun, 03 Apr 2022 15:07:53 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
> 
> > From: Lars Ingebrigtsen <larsi <at> gnus.org>
> > Cc: dal-blazej <at> onenetbeyond.org,  54657 <at> debbugs.gnu.org
> > Date: Sun, 03 Apr 2022 13:52:42 +0200
> > 
> > Wow.  If you call
> > 
> > (benchmark-run 1 (libxml-parse-html-region (point-min) (point-max)))
> > 
> > in the 12MB source buffer for that page, it reports 30 seconds?  It
> > reports 0.01 seconds for me.
> 
> I didn't do the above, I just stepped through eww-display-html in
> Edebug, and looked at my watch (30 sec is easy to measure without any
> instruments).
> 
> So what is your version of libxml2?  Maybe that is the important
> aspect here.

And in addition, I hope you verified that when
libxml-parse-html-region finishes in your case after so little time,
it returns the document's DOM, not something trivial?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 12:23:01 GMT) Full text and rfc822 format available.

Message #55 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 14:21:55 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> I didn't do the above, I just stepped through eww-display-html in
> Edebug, and looked at my watch (30 sec is easy to measure without any
> instruments).

Well, that doesn't say whether it's libxml2 or something else...  could
you try the benchmark-run?

> So what is your version of libxml2?  Maybe that is the important
> aspect here.

apt says:

libxml2/testing,now 2.9.13+dfsg-1 amd64 [installed,automatic]

I guess it could also have something to do with how we're interfacing
with the library -- perhaps we're creating strings (or something) in a
sub-optimal way on some architectures and not others?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 12:23:02 GMT) Full text and rfc822 format available.

Message #58 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 54657 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 14:22:34 +0200
Andreas Schwab <schwab <at> linux-m68k.org> writes:

> I'm getting 46 seconds.

Wow.  What OS is this on, and what's the libxml2 version?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 12:27:02 GMT) Full text and rfc822 format available.

Message #61 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 14:25:58 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> And in addition, I hope you verified that when
> libxml-parse-html-region finishes in your case after so little time,
> it returns the document's DOM, not something trivial?

The results look fine to me.  (memory-report-object-size
(libxml-parse-html-region (point-min) (point-max))) reports a 10MB
structure.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 12:45:02 GMT) Full text and rfc822 format available.

Message #64 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 15:44:52 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: dal-blazej <at> onenetbeyond.org,  54657 <at> debbugs.gnu.org
> Date: Sun, 03 Apr 2022 14:21:55 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > I didn't do the above, I just stepped through eww-display-html in
> > Edebug, and looked at my watch (30 sec is easy to measure without any
> > instruments).
> 
> Well, that doesn't say whether it's libxml2 or something else...

Of course it does: I stepped through the code one sexp at a time, and
measured only the time it took to execute libxml-parse-html-region
(after seeing as it doesn't return quickly enough).

> libxml2/testing,now 2.9.13+dfsg-1 amd64 [installed,automatic]

It's 2.7.8 here.

> I guess it could also have something to do with how we're interfacing
> with the library -- perhaps we're creating strings (or something) in a
> sub-optimal way on some architectures and not others?

The OP is on x86_64-pc-linux-gnu.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 13:58:01 GMT) Full text and rfc822 format available.

Message #67 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 15:57:24 +0200
Name        : libxml2-2
Version     : 2.9.7
Release     : 3.37.1
Architecture: x86_64
Install Date: Sa 26 Jun 2021 14:16:06 CEST
Group       : System/Libraries
Size        : 1620022
License     : MIT
Signature   : RSA/SHA256, Fr 21 Mai 2021 16:18:38 CEST, Key ID 70af9e8139db7c82
Source RPM  : libxml2-2.9.7-3.37.1.src.rpm
Build Date  : Fr 21 Mai 2021 16:17:51 CEST
Build Host  : sheep62
Relocations : (not relocatable)
Packager    : https://www.suse.com/
Vendor      : SUSE LLC <https://www.suse.com/>
URL         : http://xmlsoft.org
Summary     : A Library to Manipulate XML Files
Description :
The XML C library was initially developed for the GNOME project. It is
now used by many programs to load and save extensible data structures
or manipulate any kind of XML files.

This library implements a number of existing standards related to
markup languages, including the XML standard, name spaces in XML, XML
Base, RFC 2396, XPath, XPointer, HTML4, XInclude, SGML catalogs, and
XML catalogs. In most cases, libxml tries to implement the
specification in a rather strict way. To some extent, it provides
support for the following specifications, but does not claim to
implement them: DOM, FTP client, HTTP client, and SAX.

The library also supports RelaxNG. Support for W3C XML Schemas is in
progress.
Distribution: SUSE Linux Enterprise 15

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 14:56:02 GMT) Full text and rfc822 format available.

Message #70 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: 54657 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 16:55:32 +0200
Andreas Schwab <schwab <at> linux-m68k.org> writes:

> Version     : 2.9.7

I tried this on a Debian/bullseye machine, which has 2.9.10, and I can
reproduce the problem there -- libxml-parse-html-region takes 20 seconds
there.

So I guess this is something the libxml people have fixed sometime
between 2.9.10 and 2.9.13.  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 15:06:01 GMT) Full text and rfc822 format available.

Message #73 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 17:05:11 +0200
On Apr 03 2022, Lars Ingebrigtsen wrote:

> So I guess this is something the libxml people have fixed sometime
> between 2.9.10 and 2.9.13.  

https://gitlab.gnome.org/GNOME/libxml2/-/commit/faea2fa9

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Sun, 03 Apr 2022 15:06:02 GMT) Full text and rfc822 format available.

Message #76 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Sun, 03 Apr 2022 18:05:10 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: 54657 <at> debbugs.gnu.org,  Eli Zaretskii <eliz <at> gnu.org>,
>   dal-blazej <at> onenetbeyond.org
> Date: Sun, 03 Apr 2022 16:55:32 +0200
> 
> Andreas Schwab <schwab <at> linux-m68k.org> writes:
> 
> > Version     : 2.9.7
> 
> I tried this on a Debian/bullseye machine, which has 2.9.10, and I can
> reproduce the problem there -- libxml-parse-html-region takes 20 seconds
> there.
> 
> So I guess this is something the libxml people have fixed sometime
> between 2.9.10 and 2.9.13.  

I guess this should be in PROBLEMS?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Mon, 04 Apr 2022 10:32:02 GMT) Full text and rfc822 format available.

Message #79 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 54657 <at> debbugs.gnu.org, schwab <at> linux-m68k.org, dal-blazej <at> onenetbeyond.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Mon, 04 Apr 2022 12:31:21 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> So I guess this is something the libxml people have fixed sometime
>> between 2.9.10 and 2.9.13.  
>
> I guess this should be in PROBLEMS?

I guess so, but it's a pretty obscure problem -- it's really unusual to
have that much inline <script> stuff in the HTML.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#54657; Package emacs. (Thu, 21 Apr 2022 14:24:02 GMT) Full text and rfc822 format available.

Message #82 received at 54657 <at> debbugs.gnu.org (full text, mbox):

From: dal-blazej <at> onenetbeyond.org
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 54657 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>, schwab <at> linux-m68k.org
Subject: Re: bug#54657: 29.0.50; 100% CPU usage with eww on
 https://blogsurf.io/
Date: Thu, 21 Apr 2022 16:23:21 +0200
> I guess so, but it's a pretty obscure problem

I think so. I'll close.

Thanks for your insights.




bug closed, send any further explanations to 54657 <at> debbugs.gnu.org and dal-blazej <at> onenetbeyond.org Request was from dal-blazej <at> onenetbeyond.org to control <at> debbugs.gnu.org. (Thu, 21 Apr 2022 14:26:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 20 May 2022 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 30 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.