From unknown Fri Jun 13 10:10:15 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#31665 <31665@debbugs.gnu.org> To: bug#31665 <31665@debbugs.gnu.org> Subject: Status: libxml-parse-html-region' doesn't extract text in tables Reply-To: bug#31665 <31665@debbugs.gnu.org> Date: Fri, 13 Jun 2025 17:10:15 +0000 retitle 31665 libxml-parse-html-region' doesn't extract text in tables reassign 31665 emacs submitter 31665 =E7=A9=8D=E4=B8=B9=E5=B0=BC Dan Jacobson severity 31665 minor tag 31665 moreinfo fixed thanks From debbugs-submit-bounces@debbugs.gnu.org Thu May 31 05:55:30 2018 Received: (at submit) by debbugs.gnu.org; 31 May 2018 09:55:30 +0000 Received: from localhost ([127.0.0.1]:55630 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fOKIs-0001UE-Fy for submit@debbugs.gnu.org; Thu, 31 May 2018 05:55:30 -0400 Received: from eggs.gnu.org ([208.118.235.92]:38503) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fOKIr-0001U2-Md for submit@debbugs.gnu.org; Thu, 31 May 2018 05:55:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fOKIl-0007Xg-HW for submit@debbugs.gnu.org; Thu, 31 May 2018 05:55:24 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00, FROM_EXCESS_BASE64, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:46456) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fOKIl-0007XF-DM for submit@debbugs.gnu.org; Thu, 31 May 2018 05:55:23 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57693) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fOKIk-0006HN-CW for bug-gnu-emacs@gnu.org; Thu, 31 May 2018 05:55:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fOKIf-0007NE-HW for bug-gnu-emacs@gnu.org; Thu, 31 May 2018 05:55:22 -0400 Received: from homie.mail.dreamhost.com ([208.97.132.208]:37089 helo=homiemail-a62.g.dreamhost.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fOKIf-0007Jl-8h for bug-gnu-emacs@gnu.org; Thu, 31 May 2018 05:55:17 -0400 Received: from homiemail-a62.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a62.g.dreamhost.com (Postfix) with ESMTP id DA5F4634079; Thu, 31 May 2018 02:55:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to:cc :subject:references:date:message-id:mime-version:content-type; s=jidanni.org; bh=ee3Nd0qDvZIJcIrSX2WEuIC9O1o=; b=Iyi/8ETDgg7Ua WsLqHm4L823gmjpFma3Aso6GlP/XYEgPmlEyPLePdWCSfOvhzTsPSlPHlQ5RlbNs v2Z8s5S9sNGJCD3qbK/W35Sf+MHNTQbQZzi5Yem1qUncGll40JJHlWuX63626yFs FMfWrvRETEUO+vmf/gERwSSw69potg= Received: from jidanni.org (39-10-196-27.adsl.fetnet.net [39.10.196.27]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jidanni@jidanni.org) by homiemail-a62.g.dreamhost.com (Postfix) with ESMTPSA id 5B24E63406C; Thu, 31 May 2018 02:55:12 -0700 (PDT) From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson To: bug-gnu-emacs@gnu.org Subject: libxml-parse-html-region' doesn't extract text in tables References: <8736zjmtsa.fsf@jidanni.org> Date: Thu, 31 May 2018 17:55:04 +0800 Message-ID: <87tvqof8uf.fsf_-_@jidanni.org> MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.9 (----) X-Debbugs-Envelope-To: submit Cc: Katsumi Yamaoka X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.9 (-----) Dear bug-gnu-emacs, libxml-parse-html-region' doesn't extract text in s, KY> I found that Emacs' built-in function `libxml-parse-html-region' KY> doesn't extract text existing in the table clause. From debbugs-submit-bounces@debbugs.gnu.org Thu May 31 06:58:47 2018 Received: (at 31665) by debbugs.gnu.org; 31 May 2018 10:58:47 +0000 Received: from localhost ([127.0.0.1]:55652 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fOLI7-00052t-0i for submit@debbugs.gnu.org; Thu, 31 May 2018 06:58:47 -0400 Received: from hermes.netfonds.no ([80.91.224.195]:53629) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fOLI4-00052i-MN for 31665@debbugs.gnu.org; Thu, 31 May 2018 06:58:45 -0400 Received: from cm-84.212.221.165.getinternet.no ([84.212.221.165] helo=stories) by hermes.netfonds.no with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1fOLHx-0001Pc-DA; Thu, 31 May 2018 12:58:39 +0200 From: Lars Ingebrigtsen To: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson Subject: Re: bug#31665: libxml-parse-html-region' doesn't extract text in tables References: <8736zjmtsa.fsf@jidanni.org> <87tvqof8uf.fsf_-_@jidanni.org> Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAHlBMVEV+vu8PFlQfabcCARNI i8QCABIZPoUUJGkEAxkBAA9D/akxAAACWUlEQVQ4jYXTTWvcMBAGYBVMznUbdn1shXH2WoTBezNG JD/A6JBbVwR5j3UIUq6OkzA9L4bq33Zm5G0KDVSnXT2eefVhixzHVpbDACE456RzIeDUKGjeS1kV ajs4K6V0AwtBHmx1eDlteF5WWDLmgWF7XxVx8TRddrKikgTBDz/j8kmWpda6k1gSEEYEd9zH+CQ1 jzNQuP1MoP8BefUS4+mBp7uyqxi2h3xrd/NtbChAH2CyK8gx2KO67bnTNQBMcmCwg7eFWppLgt2s FGwcZwTnLcx96lSouocpga9cBbOaOBqU6RXQqSBIewWgKKLNACt68I4ygu2+AjTPOC8uQPWmBo/x CGW3Q6BdtF8RsNU9QQilRoCSO821UjOUDL7VRwTdttkBFI5m0lVqxdAK8QEXR/CUwGv9yHBdzKpW RsElg3cJHrICE/ZYAXyMIjipCwS4wUb7XjUYJ1coCaYdBySgDQZbIswN9jF1j43gLgG+Ggj46D7W FABPdIcYbivLMGN/anRzly5X+MFrgtrQfANfsvSe4JEQoJwUd/qOUK0gEVInlB/UKUFwj8CHRAQf U/ZwhvkMzxmDW4HmKR4v/A4jVvBHbrU3nP7M4BlSK2MMwWvnzhB4ucqYdFS8KIScYO6NWYzpa8Xw VjFHHEtvlr55rfBTHDg84KnfEpgYezURuD9QUwnqqQF63vMGRwQV16GaTfpwcAHD37D00/AuxKXZ ruAG2l/MD5sx5ETfwjiuFXivEd+4i1bgj/iLPnGqOMz0V4hMtCItAHsL/K/fG+JtaJG1mfjv+A1w xnr3CQTG0wAAAABJRU5ErkJggg== Date: Thu, 31 May 2018 12:58:37 +0200 In-Reply-To: <87tvqof8uf.fsf_-_@jidanni.org> (=?utf-8?B?IuepjeS4ueWwvA==?= Dan Jacobson"'s message of "Thu, 31 May 2018 17:55:04 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31665 Cc: Katsumi Yamaoka , 31665@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) =E7=A9=8D=E4=B8=B9=E5=B0=BC Dan Jacobson writes: > Dear bug-gnu-emacs, libxml-parse-html-region' doesn't extract text in >
s, Do you have an example table that `libxml-parse-html-region' doesn't "extract" text from? --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Sat Jun 02 20:18:04 2018 Received: (at control) by debbugs.gnu.org; 3 Jun 2018 00:18:04 +0000 Received: from localhost ([127.0.0.1]:59666 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fPGih-00027w-TE for submit@debbugs.gnu.org; Sat, 02 Jun 2018 20:18:04 -0400 Received: from mail-io0-f173.google.com ([209.85.223.173]:42522) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fPGig-00027R-3C for control@debbugs.gnu.org; Sat, 02 Jun 2018 20:18:03 -0400 Received: by mail-io0-f173.google.com with SMTP id a10-v6so33942133ioc.9 for ; Sat, 02 Jun 2018 17:18:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:mime-version; bh=YLcSH0ENfhrSM9MSfzN5ImPt4tbB7cwu+7iiSVYVmGo=; b=iPCMudDA7WWJV72dmDL2nPM+1z34t1w59lVRM1hFsdSZoYd8YRm8bHKlm78pyyMYAe 4iXSuiyyQXZ0MMDL3IjYMirkTnel6zYs/8vUhAiJtODFIpn+SGMIu0QNcPEKAHONyZq0 LnCGghSZo9nR+q0lRAWLhdX/vMcGGg84hoSodRXNQs5sXUjNoizVFZXmDTvlLd54eb/5 WuSUwJoDnuAXSvN/jbll3v647haYUoy9tpt0jfufSb+ntGt2ud7VPy7+aJWaDsWTxVmF wR1Scfctrv3eU6HcET6p0RMMidO9/DySEwp4CL476+WTpiZnwO3nNcEzBvolxuwfYb5b H1lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:mime-version; bh=YLcSH0ENfhrSM9MSfzN5ImPt4tbB7cwu+7iiSVYVmGo=; b=AYnEmkGJPMU9eOBSFjWCTcp7fVu3SwOQ1lfswUvs5zCZCiRoFUEc/oaO3Kvg67TMGO z8b2QHZID2NNhn5gdV7+WAOr3lob7LrNiJNiorDkRRSuGLlODarxEkHvrkZ6ImSGzF00 9QjrON0U63Zhtfcko7ZSxkTtbyI7ca3BG43TLKgzBDFkTyaSzawXxyoyF+RCFu41CrDr h+g/+VAJPoGzXn1sEnrykItWJG5fQFR/rHunzlTM+TjvngNvLkHXkqrVVO2N1HkbTKYt OjNki+4ygnafJW+5/Y5+ntF8LgqPXZfMdffX0U7PVTs9MqHMuec8opWb0NgAdZ1zHpGg YjHg== X-Gm-Message-State: ALKqPwcIrT6kuteF6h9bEs+diSg5mpgwSr+IbjONdylQpJFPIRw2eVZJ bPJxpYr5Pp9N5ZjlzD2A8xm13g== X-Google-Smtp-Source: ADUXVKKw+KogkK9unhQk22jIv95G8HswiTEFHTgu/eP515HRFoGp0t4+yZ2oZYb+fxN6ZNo3+KzkvA== X-Received: by 2002:a6b:2550:: with SMTP id l77-v6mr15292307iol.47.1527985076489; Sat, 02 Jun 2018 17:17:56 -0700 (PDT) Received: from zebian (cbl-45-2-119-34.yyz.frontiernetworks.ca. [45.2.119.34]) by smtp.googlemail.com with ESMTPSA id n190-v6sm2734511ith.27.2018.06.02.17.17.55 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 02 Jun 2018 17:17:55 -0700 (PDT) From: Noam Postavsky To: control@debbugs.gnu.org Subject: control message for bug #31665 Date: Sat, 02 Jun 2018 20:17:54 -0400 Message-ID: <87bmcsafkd.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 31665 + moreinfo quit From debbugs-submit-bounces@debbugs.gnu.org Thu Jun 07 03:40:19 2018 Received: (at 31665) by debbugs.gnu.org; 7 Jun 2018 07:40:19 +0000 Received: from localhost ([127.0.0.1]:37382 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fQpWo-0003Of-Hm for submit@debbugs.gnu.org; Thu, 07 Jun 2018 03:40:19 -0400 Received: from homie.mail.dreamhost.com ([208.97.132.208]:51545 helo=homiemail-a2.g.dreamhost.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fQpWh-0003OR-G6 for 31665@debbugs.gnu.org; Thu, 07 Jun 2018 03:40:12 -0400 Received: from homiemail-a2.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a2.g.dreamhost.com (Postfix) with ESMTP id 8BD5B28006D; Thu, 7 Jun 2018 00:40:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to:cc :subject:references:date:message-id:mime-version:content-type; s=jidanni.org; bh=S0LxGR0NM4wcDejYiqWV/CW+XKM=; b=oaHejUljiwWjJ q1Km2e4Xv+r4RT6AI7995hGbvRVhSHbFjvmnS4GWXFm3TwHdC/2ZzriQ9prVSP5B pAartK+X6nJBnH3vxClFIYd7HRGF09cP6b8aheTNH8cXPYEyWhiUB0HjXo3agHae azgJWwZ4jNFxB5w0lMmwd44q41jcZg= Received: from jidanni.org (1-170-84-160.dynamic-ip.hinet.net [1.170.84.160]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jidanni@jidanni.org) by homiemail-a2.g.dreamhost.com (Postfix) with ESMTPSA id D5F55280063; Thu, 7 Jun 2018 00:40:05 -0700 (PDT) From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson To: Lars Ingebrigtsen Subject: Re: bug#31665: libxml-parse-html-region' doesn't extract text in tables References: <8736zjmtsa.fsf@jidanni.org> <87tvqof8uf.fsf_-_@jidanni.org> Date: Thu, 07 Jun 2018 04:50:15 +0800 Message-ID: <87d0x3od14.fsf@jidanni.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: >>>>> "LI" == Lars Ingebrigtsen writes: LI> Do you have an example table that `libxml-parse-html-region' doesn't LI> "extract" text from? OK here is a mail that I cleaned off my personal phone bill from: [...] Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [208.97.132.208 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [208.97.132.208 listed in list.dnswl.org] 1.1 DATE_IN_PAST_06_12 Date: is 6 to 12 hours before Received: date 0.1 FROM_EXCESS_BASE64 From: base64 encoded unnecessarily -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid X-Debbugs-Envelope-To: 31665 Cc: Katsumi Yamaoka , 31665@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.2 (/) --=-=-= Content-Type: text/plain >>>>> "LI" == Lars Ingebrigtsen writes: LI> Do you have an example table that `libxml-parse-html-region' doesn't LI> "extract" text from? OK here is a mail that I cleaned off my personal phone bill from: --=-=-= Content-Type: application/gzip Content-Disposition: attachment; filename=gg.gz Content-Transfer-Encoding: base64 H4sICFRIGFsAA2dnAO1abXPjthH+XPwKlJ1kmmklvumFlAX1qBc3zsQXx3Z6008eioQl3lEkQ1In q78+u+CLSEryyfY5c57pnW2RSxDYfXax+wDQeRyuaBDOQ3dLb5dr+tM6oLRPlc6gow10nWqKapCp nfIBPv4nVfv00t4KMVW1AfwoPfoPxVAUcg5dDejVMgw4nYRtOozeOcu07YSrdroZkWse+dvWbTig DxU5QYH00XPtIPDe5Z/tMF5IdHhAOiKXPEnsBW9dTAd0qBs9TTEfHvSOoXTNd3w+t1NnqY7IzXr+ kTtpoc7Y831yeXE5a/2Hx4kXBgOqthUyCYOUB2nrdhuBfau1n3qRHafyynvg7hklf5mH68C14y2T WvCP3V3B0zvNNHu6dqf2+qqq9eG6rXa1Xkc1My0kQlqt05s3lEj5Qyov05V/5iztOOEp++32vGXs WsV2kNzzuDULnND1gsWA/r4OU+62otgLUnvuc0KGf53+Mrn979WM/nh7+TO9+m3888WESi1Z/qBP ZHl6O80edNqKSkWPXgqg2L4sz95LIzJEBUbDJbfdEaF0mHqpz0ds1mFjg1lTNjOYcc6sczYzmTlm 47F4BBK1IuliS1PBC3hrrOMFYWaPWbOhnPWIfa94atNlmkYt/vva+8z0qVRFRKJOdocPduDQAh19 KvBBpWWh7xBDGe5c7zO1fW8R4IsOdMBjbCQQokm69Tk+2HhuuhxQQ+9GD2d0yb3FEoJGVRQF7iVK He77ke0i0NhckYQkiWzHY6SUfS4HSsNIyiDL1KAUr+PsAi/dA42zJ0KxeRi7PG4OVQ6UKYg3ugK3 Qnu86ymGRIuuitHpvrlZ94Mk9D2XqtED/Zs+75qmcZbbOFDBbjC77BiQ+C4blpFsYJBIu5Fq1pU2 NgTRaOitFjSJHewAfT2Q5c1m096lATlc3234XAZJIPM5TFfZW8E0T+Q5IzD5eay1o2BRRaDTqwBg oFZDOWqMLNd1gft4xBSTMk0R4uKzonvFPyvPdX0uNc07xVF1xWod1IOjLo/3hZlSzUjeDaDr4KD9 5/T6lw+M3FxZ70GqjwhYXf0R06MMC5pPA62nVKaB2u/gJBgNYX7G/L70HSOyrGp61Xuz8RXmOlmR J8t1sFhubHnie86nGx5/5vG/UnT87HtHfEwvVaWvdL+38e4CdFQlmtrxQkxm6W7u28En6VDENMa8 gnITYtrCwWXuruQZI3nn8sWNKuIFYsKGX7D2ILIiPo5AXkN4NEzSOAwWo+E9pCN6bzsCOJHcZpDR MCVC0oP8Bkmxh5nP0jEXgswUSdGCi45EE+9/4kVNGuErFjS18AWjg2lz1memhdciVUKvliYeQt89 Njtn4wkzJkKiMcsQKRbGn+aPTGsoF1pmhlU9/oI4ElFEsjjaxwo6fl4wYUY55NhnBRMjIpxutK8R TIyIcNoFk3ZyMNVEzfTy+CT/QmwdiyxyMLb6GCNjKy/M5lQUZouZ1fARVdnoowTKsoldVSMQRjZM vDC6onkZfwaOaIxFBxMs7nmbrnjrnBk97Idg9MJzHLgnmneZ2RVmmDhYNohhMUsRhoFyHXGhYsCj AQqyjGyWEBHmpeY9NMSa5BfYZUXvkm0IrlHRu1BOmJkrV+Kki/EysC00EJUzhC2C8yAUBuqdjZa/ rkNnQs+/C1IEzEfN4URcMwokAMaRhRPAWiObuh1m9XOlrPEPqCcpNO3j67nDRSfoRkvo2d3LC4KV ZZ0DGKBN5hdy1DO57l18KUMqZ3N1zBHxDBWNkeH4GopIZiN210ewLWEsdAF+faqbSeHo/bASOmcx YGXOMfA9Q0SAaQhVu0gtQUn0N8baWCuAKOLSMJuFqyAdXsAT3krbthvOOWYAOYpDd+2kiWw7MSMh rCLkGMhkkvJIayPhPJRVLCfGlvQaWgIV2M3SPNtjtsBYyzE8kB9oTklIswaBdJ8igBCpx2kUp2A5 5Us5F0ROuEtAu6xTJI9Wt8l3wPFDGdvs3hJjFiOVQ+BwJzDOOcf/NcZJz6QG5RQmkCpVO5I9DzDq p5BNIPF1uhnwjdpeePeHWV9pVgiV6N4PN4MlcEQenInHLd/ehmtYP9yL9eMGTG9tYjsaUEbmEE+f WigCW/fIY2WJoe7RSbUBDiNflVH6/D6t1PkjVb5e4/MS3+9+d1aU+25frS6htJ6a1Xug781Z+Px6 z0idPlrTF5BHmOh1+ljUex64qqK2P0anlfxHWFaMYOx56yieWgVP1awtSU0zp0+MfD04d4S5gPMl 9ImRY3Dea8+EM8uONUlT8NYR118FcV9X8jT2dRD/Qg55c6h3Xgf1zktQx1q61/AkGnCQBLyN0n+8 8NOi+hfzvND7aDg+iwfAWu/QttMyTHNXHlrYlRiXHiS7bBQ3thT2N8QO8LhdDT918j2ylldNAHJv y+cIG35kC84NHVlF/isoR6M/7PK523qM5AiLZfWepjB5mjn+oA9eCYgEMojnQBDYawgke5utAF4D AO0RAA4Ye4RuvBCABLTmyTKMqnp/AKVfxWT9WzG5ntpfxdbOt2mrPLmxoHh5QRqHr+Pk7ksNP4mG vHCVXFavbKPuDVfJP3uR7HvBp2KV/PTDkWOFrxb2xSZGXl2f6AfhgBz6AwX0sT3Y+eFHx+sPEIR8 153u9t+f3MuXJnnDeFj2in1i8pSd4myfeO88p/A88Nrjs/vc+hXLIG4tP3rCK/Zcf8X9+FV0ZmU0 WPjkebZr/T/d9iiMU9uXr+2Uuzxx7pL1KjusGWvC6j7uEuPurtiUzI5vTAOEpbHQ2dsx9zDr6xTO 3t9u7uFfw9ghQionCQZuzxvfvN+xJi49oHaMtHE7OG3DL6wjUyvYXtlb+XKL3xjh8UXg8oe2nUAG FmcqY3HMkG80W0JiisAXWBAY9ouW/z8jPStMS06OXwexnXSdtBwbXASliOf8PD8NwmOCLh4DmOLw aqzkRwkk8963Hpqnz9FuMUeziShOdMbnzNDFZD0XmdnEIx9TGzLyNszmuCu74jgdpVHlaKs8oiol 4nwMT6IMcZA0KZwujrrwbMrKTpkADkv51p2e5OaD08tstAw3DkKRBfeP3nueVk78u+I8sF8ztmkq I6enoJdyafbGjpweP3B6Sn4+tum59+WzEyg3I8e+BoWtfrr6t/Sk7x+U3mk4cte0CIXS/XVks7Hg Uzwciu/ejcgfnIYTJzQqAAA= --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 04:35:05 2019 Received: (at 31665) by debbugs.gnu.org; 29 Sep 2019 08:35:05 +0000 Received: from localhost ([127.0.0.1]:52202 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEUfZ-0001YR-BT for submit@debbugs.gnu.org; Sun, 29 Sep 2019 04:35:05 -0400 Received: from quimby.gnus.org ([80.91.231.51]:48702) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEUfX-0001YG-NG for 31665@debbugs.gnu.org; Sun, 29 Sep 2019 04:35:04 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEUfQ-0006Y5-BL; Sun, 29 Sep 2019 10:34:59 +0200 From: Lars Ingebrigtsen To: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson Subject: Re: bug#31665: libxml-parse-html-region' doesn't extract text in tables References: <8736zjmtsa.fsf@jidanni.org> <87tvqof8uf.fsf_-_@jidanni.org> <87d0x3od14.fsf@jidanni.org> Date: Sun, 29 Sep 2019 10:34:56 +0200 In-Reply-To: <87d0x3od14.fsf@jidanni.org> (=?utf-8?B?IuepjeS4ueWwvA==?= Dan Jacobson"'s message of "Thu, 07 Jun 2018 04:50:15 +0800") Message-ID: <87r23zqz4f.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: 積丹尼 Dan Jacobson writes: >>>>>> "LI" == Lars Ingebrigtsen writes: > > LI> Do you have an example table that `libxml-parse-html-region' doesn't > LI> "extract" text from? > > OK here is a mail that I cleaned o [...] Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31665 Cc: Katsumi Yamaoka , 31665@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) =E7=A9=8D=E4=B8=B9=E5=B0=BC Dan Jacobson writes: >>>>>> "LI" =3D=3D Lars Ingebrigtsen writes: > > LI> Do you have an example table that `libxml-parse-html-region' doesn't > LI> "extract" text from? > > OK here is a mail that I cleaned off my personal phone bill from: What was it you think is missing from that table? I don't read Chinese, but there didn't seem to be any text in that table, just a bunch of images. --=20 (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 04:35:14 2019 Received: (at control) by debbugs.gnu.org; 29 Sep 2019 08:35:14 +0000 Received: from localhost ([127.0.0.1]:52205 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEUfh-0001Yq-Oi for submit@debbugs.gnu.org; Sun, 29 Sep 2019 04:35:13 -0400 Received: from quimby.gnus.org ([80.91.231.51]:48722) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEUfg-0001Yi-6b for control@debbugs.gnu.org; Sun, 29 Sep 2019 04:35:12 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEUfd-0006YM-FP for control@debbugs.gnu.org; Sun, 29 Sep 2019 10:35:11 +0200 Date: Sun, 29 Sep 2019 10:35:09 +0200 Message-Id: <87pnjjqz42.fsf@gnus.org> To: control@debbugs.gnu.org From: Lars Ingebrigtsen Subject: control message for bug #31665 X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: tags 31665 + moreinfo quit Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 31665 + moreinfo quit From debbugs-submit-bounces@debbugs.gnu.org Sun Sep 29 12:52:58 2019 Received: (at 31665) by debbugs.gnu.org; 29 Sep 2019 16:52:58 +0000 Received: from localhost ([127.0.0.1]:55742 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEcRN-0001k0-Qw for submit@debbugs.gnu.org; Sun, 29 Sep 2019 12:52:58 -0400 Received: from caracal.birch.relay.mailchannels.net ([23.83.209.30]:63616) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEcRK-0001jo-1c for 31665@debbugs.gnu.org; Sun, 29 Sep 2019 12:52:56 -0400 X-Sender-Id: dreamhost|x-authsender|jidanni@jidanni.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id AE2F86A152D; Sun, 29 Sep 2019 16:52:52 +0000 (UTC) Received: from pdx1-sub0-mail-a100.g.dreamhost.com (100-96-87-159.trex.outbound.svc.cluster.local [100.96.87.159]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 27B8E6A21F6; Sun, 29 Sep 2019 16:52:52 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|jidanni@jidanni.org Received: from pdx1-sub0-mail-a100.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.18.1); Sun, 29 Sep 2019 16:52:52 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jidanni@jidanni.org X-MailChannels-Auth-Id: dreamhost X-Abortive-Print: 6f886202067fb7da_1569775972567_316348013 X-MC-Loop-Signature: 1569775972567:115607621 X-MC-Ingress-Time: 1569775972567 Received: from pdx1-sub0-mail-a100.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a100.g.dreamhost.com (Postfix) with ESMTP id C0DAF80D75; Sun, 29 Sep 2019 09:52:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to:cc :subject:references:date:message-id:mime-version:content-type :content-transfer-encoding; s=jidanni.org; bh=GGDdZoK0nrX+V43qUM xSAnGZiMo=; b=b6QTuqbKeyrBDUydSaTCimkMl9sQNsltA5tOMoAmmrNBouBObb e3CK3inIGgrxfvsvJ5u8hHHvF++O1hhSML1erPVbdbN2r57kvebr2fprEQva2Tuo cvp3o8Y0jybLdVwpHYUX2pA9klqtiYz1Aceua5gqUHOH6f9iMZ0keK6v0= Received: from jidanni.org (220-140-6-101.dynamic-ip.hinet.net [220.140.6.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jidanni@jidanni.org) by pdx1-sub0-mail-a100.g.dreamhost.com (Postfix) with ESMTPSA id 8D43F80D6E; Sun, 29 Sep 2019 09:52:44 -0700 (PDT) X-DH-BACKEND: pdx1-sub0-mail-a100 From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson To: Lars Ingebrigtsen Subject: Re: bug#31665: libxml-parse-html-region' doesn't extract text in tables References: <8736zjmtsa.fsf@jidanni.org> <87tvqof8uf.fsf_-_@jidanni.org> <87d0x3od14.fsf@jidanni.org> <87r23zqz4f.fsf@gnus.org> Date: Mon, 30 Sep 2019 00:52:40 +0800 Message-ID: <877e5rgi3r.5.fsf@jidanni.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-VR-OUT-STATUS: OK X-VR-OUT-SCORE: -100 X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedufedrgedtgddutdekucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufhffffkgggtgfesthekredttddtjeenucfhrhhomhepnjjnnjcuffgrnhculfgrtghosghsohhnuceojhhiuggrnhhnihesjhhiuggrnhhnihdrohhrgheqnecukfhppedvvddtrddugedtrdeirddutddunecurfgrrhgrmhepmhhouggvpehsmhhtphdphhgvlhhopehjihgurghnnhhirdhorhhgpdhinhgvthepvddvtddrudegtddriedruddtuddprhgvthhurhhnqdhprghthheppeeruhhtfhdqkeerueerheeimhfphefnihehhegsveekreepucffrghnucflrggtohgsshhonhcuoehjihgurghnnhhisehjihgurghnnhhirdhorhhgqedpmhgrihhlfhhrohhmpehjihgurghnnhhisehjihgurghnnhhirdhorhhgpdhnrhgtphhtthhopeefudeiieehseguvggssghughhsrdhgnhhurdhorhhgnecuvehluhhsthgvrhfuihiivgeptd Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31665 Cc: Katsumi Yamaoka , 31665@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) >>>>> "LI" =3D=3D Lars Ingebrigtsen writes: LI> =E7=A9=8D=E4=B8=B9=E5=B0=BC Dan Jacobson writes= : >>>>>>> "LI" =3D=3D Lars Ingebrigtsen writes: >>=20 LI> Do you have an example table that `libxml-parse-html-region' doesn't LI> "extract" text from? >>=20 >> OK here is a mail that I cleaned off my personal phone bill from: LI> What was it you think is missing from that table? I don't read Chine= se, LI> but there didn't seem to be any text in that table, just a bunch of LI> images. It should look like: +------------------------------------------------------------------------= -------------------------------------------------------------------------= ---+ |+-----------------------------------------------------------------------= ----------------------------------------------+ = | ||+----------------------------------------------------------------------= --------------------------------------------+ | = | |||[banner2] = | | = | |||----------------------------------------------------------------------= --------------------------------------------| | = | |||+---------------------------------------------------------------------= ------------------------------------------+ | | = | |||| |=E8=A6=AA=E6=84=9B=E7=9A=84=E5=AE= =A2=E6=88=B6=EF=BC=8C=E6=82=A8=E5=A5=BD=EF=BC=9A | = | | | | |||| |--------------------------------= -----| | | | = | |||| |=E7=82=BA=E4=BF=9D=E9=9A=9C=E6=82= =A8=E8=B3=87=E6=96=99=E7=9A=84=E5=AE=89=E5=85=A8=EF=BC=8C=E8=AB=8B=E8=BC=B8= =E5=85=A5=E5=AF=86=E7=A2=BC=E9=96=8B=E5=95=9F=E9=99=84 | = | | | | |||| |=E5=8A=A0=E6=AA=94=E6=A1=88=E7=80= =8F=E8=A6=BD=E6=82=A8=E6=9C=AC=E6=9C=9F=E7=9A=84=E5=B8=B3=E5=96=AE=EF=BC=8C= =E5=AF=86=E7=A2=BC=E7=82=BA=E3=80=8E=E8=BA=AB=E5=88=86 | = | | | | |||| [IS1] |=E8=AD=89=E8=99=9F=E7=A2=BC=E3=80= =8F(=E8=8B=B1=E6=96=87=E5=AD=97=E6=AF=8D=E9=A0=88=E5=A4=A7=E5=AF=AB)=EF=BC= =8C=E7=87=9F=E6=A5=AD=E4=BA=BA=E5=AE=A2=E6=88=B6 | [IS2] = | | | | |||| |=E4=B8=8D=E9=9C=80=E8=BC=B8=E5=85= =A5=E5=AF=86=E7=A2=BC=E5=8D=B3=E5=8F=AF=E7=80=8F=E8=A6=BD=E3=80=82 = | | | | = | |||| |=E8=8B=A5=E7=84=A1=E6=B3=95=E9=96= =8B=E5=95=9F=E9=99=84=E5=8A=A0=E6=AA=94=E6=A1=88=EF=BC=8C=E8=AB=8B=E5=85=88= =E7=A2=BA=E8=AA=8D=E6=98=AF=E5=90=A6=E5=B7=B2=E4=B8=8B | = | | | | |||| |=E8=BC=89Acrobat Reader=E8=BB=9F= =E9=AB=94=E3=80=82 | | |= | | |||| |--------------------------------= -----| | | | = | |||+---------------------------------------------------------------------= ------------------------------------------+ | | = | ||+----------------------------------------------------------------------= --------------------------------------------+ | = | ||++ = | = | |||| = | = | ||++ = | = | ||+----------------------------------------------------------------------= ---------------------------------------------+| = | |||[new1] = || = | |||+---------------------------------------------------------------------= --------------------------------------------+|| = | |||| | = [enf201]||| = | |||| |------------= --------------------------------------------||| = | ||||[end101] | = [enl301]||| = | |||| |------------= --------------------------------------------||| = | |||| | = [enl401]||| = | |||+---------------------------------------------------------------------= --------------------------------------------+|| = | ||+----------------------------------------------------------------------= ---------------------------------------------+| = | ||++ = | = | |||| = | = | ||++ = | = | ||+----------------------------------------------------------------------= --------------------------------------------+ | = | |||[hot1] = | | = | |||----------------------------------------------------------------------= --------------------------------------------| | = | |||+----------------------------------+ = | | = | ||||[hot1]|[hot2]|[hot3]|[hot4]|[hot5]| = | | = | |||+----------------------------------+ = | | = | ||+----------------------------------------------------------------------= --------------------------------------------+ | = | ||++ = | = | |||| = | = | ||++ = | = | ||+----------------------------------------------------------------------= --------------------------------------------+ | = | |||[link1] = | | = | |||+-----------------------------------------------------------------+ = | | = | |||||| | | | | = | | = | ||||++------------+----------------+----------------+----------------| = | | = | ||||||=E9=9B=BB=E5=AD=90=E5=B8=B3=E5=96=AEQ&A | =E8=B2=BB=E7=8E=87=E8=AA= =AA=E6=98=8E | =E5=AE=A2=E6=88=B6=E6=B6=88=E8=B2=BB=E8=B3=87=E8=A8=8A= | =E7=B7=9A=E4=B8=8A=E7=B9=B3=E8=B2=BB | = | | | ||||++------------+----------------+----------------+----------------| = | | = | |||||| =E6=9C=8D=E5=8B=99=E5=B0=88=E7=B7=9A | =E8=B2=BC=E5=BF=83=E6=8F= =90=E9=86=92 |=E4=B8=8D=E5=8F=AF=E4=B8=8D=E7=9F=A5=E8=A1=8C=E5=8B=95=E5= =84=AA=E6=83=A0| HiNet=E5=A5=BD=E5=BA=B7=E5=84=AA=E6=83=A0 | = | | | |||+-----------------------------------------------------------------+ = | | = | ||+----------------------------------------------------------------------= --------------------------------------------+ | = | ||++ = | = | |||| = | = | ||++ = | = | ||+----------------------------------------------------------------------= --------------------------------------------+ | = | ||| [cht] = | | = | ||+----------------------------------------------------------------------= --------------------------------------------+ | = | |+-----------------------------------------------------------------------= ----------------------------------------------+ = | +------------------------------------------------------------------------= -------------------------------------------------------------------------= ---+ But instead all we get is: From: Phone Co. Subject: Phone Bill To: "jidanni@jidanni.org" Date: Thu, 17 May 2018 12:12:06 +0800 Reply-To: x@cht.com.tw [1. text/html] =E4=B8=AD=E8=8F=AF=E9=9B=BB=E4=BF=A1=E9=9B=BB=E5=AD=90=E5=B8=B3=E5=96=AE * * * * * * * * * * * * * * * * From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 30 01:05:31 2019 Received: (at 31665) by debbugs.gnu.org; 30 Sep 2019 05:05:31 +0000 Received: from localhost ([127.0.0.1]:56495 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEnsJ-0003Eu-GK for submit@debbugs.gnu.org; Mon, 30 Sep 2019 01:05:31 -0400 Received: from quimby.gnus.org ([80.91.231.51]:43204) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEnsH-0003Em-Ck for 31665@debbugs.gnu.org; Mon, 30 Sep 2019 01:05:30 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEnsB-00033i-LR; Mon, 30 Sep 2019 07:05:26 +0200 From: Lars Ingebrigtsen To: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson Subject: Re: bug#31665: libxml-parse-html-region' doesn't extract text in tables References: <8736zjmtsa.fsf@jidanni.org> <87tvqof8uf.fsf_-_@jidanni.org> <87d0x3od14.fsf@jidanni.org> <87r23zqz4f.fsf@gnus.org> <877e5rgi3r.5.fsf@jidanni.org> Date: Mon, 30 Sep 2019 07:05:23 +0200 In-Reply-To: <877e5rgi3r.5.fsf@jidanni.org> (=?utf-8?B?IuepjeS4ueWwvA==?= Dan Jacobson"'s message of "Mon, 30 Sep 2019 00:52:40 +0800") Message-ID: <87blv2ml0s.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: The HTML in that email is invalid. It's basically on the form
foo
"foo" won't be rendered by shr. Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31665 Cc: Katsumi Yamaoka , 31665@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) The HTML in that email is invalid. It's basically on the form foo
"foo" won't be rendered by shr. shr does try to deal with invalid tables, though. If the elements hadn't been there, then the "foo" would have been, so I guess some more work is required in that area. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 30 01:28:28 2019 Received: (at 31665) by debbugs.gnu.org; 30 Sep 2019 05:28:28 +0000 Received: from localhost ([127.0.0.1]:56529 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEoEW-00041s-53 for submit@debbugs.gnu.org; Mon, 30 Sep 2019 01:28:28 -0400 Received: from quimby.gnus.org ([80.91.231.51]:43578) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEoEU-00041f-D4 for 31665@debbugs.gnu.org; Mon, 30 Sep 2019 01:28:26 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEoEN-0003Ha-QS; Mon, 30 Sep 2019 07:28:23 +0200 From: Lars Ingebrigtsen To: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson Subject: Re: bug#31665: libxml-parse-html-region' doesn't extract text in tables References: <8736zjmtsa.fsf@jidanni.org> <87tvqof8uf.fsf_-_@jidanni.org> <87d0x3od14.fsf@jidanni.org> <87r23zqz4f.fsf@gnus.org> <877e5rgi3r.5.fsf@jidanni.org> <87blv2ml0s.fsf@gnus.org> Date: Mon, 30 Sep 2019 07:28:19 +0200 In-Reply-To: <87blv2ml0s.fsf@gnus.org> (Lars Ingebrigtsen's message of "Mon, 30 Sep 2019 07:05:23 +0200") Message-ID: <875zlamjyk.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Lars Ingebrigtsen writes: > shr does try to deal with invalid tables, though. If the > elements hadn't been there, then the "foo" would have been, so I guess > some more work is required in that area. Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31665 Cc: Katsumi Yamaoka , 31665@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Lars Ingebrigtsen writes: > shr does try to deal with invalid tables, though. If the > elements hadn't been there, then the "foo" would have been, so I guess > some more work is required in that area. I've now fixed this on the trunk. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 30 01:28:31 2019 Received: (at control) by debbugs.gnu.org; 30 Sep 2019 05:28:31 +0000 Received: from localhost ([127.0.0.1]:56532 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEoEZ-000428-Ep for submit@debbugs.gnu.org; Mon, 30 Sep 2019 01:28:31 -0400 Received: from quimby.gnus.org ([80.91.231.51]:43594) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEoEX-000420-Kn for control@debbugs.gnu.org; Mon, 30 Sep 2019 01:28:29 -0400 Received: from cm-84.212.202.86.getinternet.no ([84.212.202.86] helo=marnie) by quimby.gnus.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1iEoEU-0003Hj-TY for control@debbugs.gnu.org; Mon, 30 Sep 2019 07:28:28 +0200 Date: Mon, 30 Sep 2019 07:28:26 +0200 Message-Id: <874l0umjyd.fsf@gnus.org> To: control@debbugs.gnu.org From: Lars Ingebrigtsen Subject: control message for bug #31665 X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: tags 31665 fixed close 31665 27.1 quit Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 31665 fixed close 31665 27.1 quit From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 30 22:43:25 2019 Received: (at 31665) by debbugs.gnu.org; 1 Oct 2019 02:43:25 +0000 Received: from localhost ([127.0.0.1]:60765 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iF88L-0006fU-33 for submit@debbugs.gnu.org; Mon, 30 Sep 2019 22:43:25 -0400 Received: from bonobo.elm.relay.mailchannels.net ([23.83.212.22]:48684) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iF88I-0006fK-CZ for 31665@debbugs.gnu.org; Mon, 30 Sep 2019 22:43:23 -0400 X-Sender-Id: tih5qno0ow|x-authuser|yamaoka@hampton.hostforweb.net Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id D0D648C1461; Tue, 1 Oct 2019 02:43:20 +0000 (UTC) Received: from hampton.hostforweb.net (100-96-88-238.trex.outbound.svc.cluster.local [100.96.88.238]) (Authenticated sender: tih5qno0ow) by relay.mailchannels.net (Postfix) with ESMTPA id 02E6E8C1C05; Tue, 1 Oct 2019 02:43:19 +0000 (UTC) X-Sender-Id: tih5qno0ow|x-authuser|yamaoka@hampton.hostforweb.net Received: from hampton.hostforweb.net ([TEMPUNAVAIL]. [172.245.115.217]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.18.2); Tue, 01 Oct 2019 02:43:20 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: tih5qno0ow|x-authuser|yamaoka@hampton.hostforweb.net X-MailChannels-Auth-Id: tih5qno0ow X-Shrill-Relation: 0efa36736b11fbe1_1569897800500_681914642 X-MC-Loop-Signature: 1569897800500:2538503838 X-MC-Ingress-Time: 1569897800499 Received: from s70.gtokyofl21.vectant.ne.jp ([202.215.75.70]:60000 helo=localhost) by hampton.hostforweb.net with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.91) (envelope-from ) id 1iF887-00G2wR-KO; Mon, 30 Sep 2019 21:43:12 -0500 Date: Tue, 01 Oct 2019 11:43:09 +0900 Message-ID: From: Katsumi Yamaoka To: Lars Ingebrigtsen Subject: Re: bug#31665: libxml-parse-html-region' doesn't extract text in tables References: <8736zjmtsa.fsf@jidanni.org> <87tvqof8uf.fsf_-_@jidanni.org> <87d0x3od14.fsf@jidanni.org> <87r23zqz4f.fsf@gnus.org> <877e5rgi3r.5.fsf@jidanni.org> <87blv2ml0s.fsf@gnus.org> <875zlamjyk.fsf@gnus.org> Organization: Emacsen advocacy group X-Face: #kKnN,xUnmKia.'[pp`; Omh}odZK)?7wQSl"4o04=EixTF+V[""w~iNbM9ZL+.b*_CxUmFk B#Fu[*?MZZH@IkN:!"\w%I_zt>[$nm7nQosZ<3eu; B:$Q_:p!',P.c0-_Cy[dz4oIpw0ESA^D*1Lw= L&i*6&( User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-cygwin) Cancel-Lock: sha1:WPgUvyOCxTQEC17TM5LrAzCPflQ= MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-OutGoing-Spam-Status: No, score=-0.2 X-AuthUser: yamaoka@hampton.hostforweb.net X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31665 Cc: 31665@debbugs.gnu.org, =?utf-8?B?56mN5Li55bC8?= Dan Jacobson X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Mon, 30 Sep 2019 07:28:19 +0200, Lars Ingebrigtsen wrote: > I've now fixed this on the trunk. Verified. Thank you for improving it! From unknown Fri Jun 13 10:10:15 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 29 Oct 2019 11:24:07 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator