GNU bug report logs - #31665
libxml-parse-html-region' doesn't extract text in tables

Previous Next

Package: emacs;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Thu, 31 May 2018 09:56:02 UTC

Severity: minor

Tags: fixed, moreinfo

Fixed in version 27.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Katsumi Yamaoka <yamaoka <at> jpl.org>, 31665 <at> debbugs.gnu.org
Subject: bug#31665: libxml-parse-html-region' doesn't extract text in tables
Date: Mon, 30 Sep 2019 00:52:40 +0800
>>>>> "LI" == Lars Ingebrigtsen <larsi <at> gnus.org> writes:
LI> 積丹尼 Dan Jacobson <jidanni <at> jidanni.org> writes:

>>>>>>> "LI" == Lars Ingebrigtsen <larsi <at> gnus.org> writes:
>> 
LI> Do you have an example table that `libxml-parse-html-region' doesn't
LI> "extract" text from?
>> 
>> OK here is a mail that I cleaned off my personal phone bill from:

LI> What was it you think is missing from that table?  I don't read Chinese,
LI> but there didn't seem to be any text in that table, just a bunch of
LI> images.

It should look like:

+----------------------------------------------------------------------------------------------------------------------------------------------------+
|+---------------------------------------------------------------------------------------------------------------------+                             |
||+------------------------------------------------------------------------------------------------------------------+ |                             |
|||[banner2]                                                                                                         | |                             |
|||------------------------------------------------------------------------------------------------------------------| |                             |
|||+---------------------------------------------------------------------------------------------------------------+ | |                             |
||||                                    |親愛的客戶,您好:                   |                                    | | |                             |
||||                                    |-------------------------------------|                                    | | |                             |
||||                                    |為保障您資料的安全,請輸入密碼開啟附 |                                    | | |                             |
||||                                    |加檔案瀏覽您本期的帳單,密碼為『身分 |                                    | | |                             |
||||               [IS1]                |證號碼』(英文字母須大寫),營業人客戶 |               [IS2]                | | |                             |
||||                                    |不需輸入密碼即可瀏覽。               |                                    | | |                             |
||||                                    |若無法開啟附加檔案,請先確認是否已下 |                                    | | |                             |
||||                                    |載Acrobat Reader軟體。               |                                    | | |                             |
||||                                    |-------------------------------------|                                    | | |                             |
|||+---------------------------------------------------------------------------------------------------------------+ | |                             |
||+------------------------------------------------------------------------------------------------------------------+ |                             |
||++                                                                                                                   |                             |
||||                                                                                                                   |                             |
||++                                                                                                                   |                             |
||+-------------------------------------------------------------------------------------------------------------------+|                             |
|||[new1]                                                                                                             ||                             |
|||+-----------------------------------------------------------------------------------------------------------------+||                             |
||||                                                        |                                                [enf201]|||                             |
||||                                                        |--------------------------------------------------------|||                             |
||||[end101]                                                |                                                [enl301]|||                             |
||||                                                        |--------------------------------------------------------|||                             |
||||                                                        |                                                [enl401]|||                             |
|||+-----------------------------------------------------------------------------------------------------------------+||                             |
||+-------------------------------------------------------------------------------------------------------------------+|                             |
||++                                                                                                                   |                             |
||||                                                                                                                   |                             |
||++                                                                                                                   |                             |
||+------------------------------------------------------------------------------------------------------------------+ |                             |
|||[hot1]                                                                                                            | |                             |
|||------------------------------------------------------------------------------------------------------------------| |                             |
|||+----------------------------------+                                                                              | |                             |
||||[hot1]|[hot2]|[hot3]|[hot4]|[hot5]|                                                                              | |                             |
|||+----------------------------------+                                                                              | |                             |
||+------------------------------------------------------------------------------------------------------------------+ |                             |
||++                                                                                                                   |                             |
||||                                                                                                                   |                             |
||++                                                                                                                   |                             |
||+------------------------------------------------------------------------------------------------------------------+ |                             |
|||[link1]                                                                                                           | |                             |
|||+-----------------------------------------------------------------+                                               | |                             |
||||||            |                |                |                |                                               | |                             |
||||++------------+----------------+----------------+----------------|                                               | |                             |
||||||電子帳單Q&A |    費率說明    |  客戶消費資訊  |    線上繳費    |                                               | |                             |
||||++------------+----------------+----------------+----------------|                                               | |                             |
||||||  服務專線  |    貼心提醒    |不可不知行動優惠| HiNet好康優惠  |                                               | |                             |
|||+-----------------------------------------------------------------+                                               | |                             |
||+------------------------------------------------------------------------------------------------------------------+ |                             |
||++                                                                                                                   |                             |
||||                                                                                                                   |                             |
||++                                                                                                                   |                             |
||+------------------------------------------------------------------------------------------------------------------+ |                             |
|||                                                      [cht]                                                       | |                             |
||+------------------------------------------------------------------------------------------------------------------+ |                             |
|+---------------------------------------------------------------------------------------------------------------------+                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------+

But instead all we get is:

From: Phone Co. <p <at> cht.com.tw>
Subject: Phone Bill
To: "jidanni <at> jidanni.org" <jidanni <at> jidanni.org>
Date: Thu, 17 May 2018 12:12:06 +0800
Reply-To: x <at> cht.com.tw

[1. text/html]
中華電信電子帳單

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*





This bug report was last modified 5 years and 229 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.