GNU bug report logs -
#29902
[PATCH] gnu: Add html-xml-utils.
Previous Next
Reported by: Stefan Reichör <stefan <at> xsteve.at>
Date: Fri, 29 Dec 2017 21:01:02 UTC
Severity: normal
Tags: patch
Done: ludo <at> gnu.org (Ludovic Courtès)
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29902 in the body.
You can then email your comments to 29902 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
guix-patches <at> gnu.org
:
bug#29902
; Package
guix-patches
.
(Fri, 29 Dec 2017 21:01:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Stefan Reichör <stefan <at> xsteve.at>
:
New bug report received and forwarded. Copy sent to
guix-patches <at> gnu.org
.
(Fri, 29 Dec 2017 21:01:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
* gnu/packages/xml.scm (html-xml-utils): New variable.
---
gnu/packages/xml.scm | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
index 344d7c3..dde1964 100644
--- a/gnu/packages/xml.scm
+++ b/gnu/packages/xml.scm
@@ -1116,6 +1116,60 @@ match and extract data, and elements can be added, deleted or modified using
XSLT and EXSLT.")
(license license:x11)))
+(define-public html-xml-utils
+ (package
+ (name "html-xml-utils")
+ (version "7.4")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (string-append
+ "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
+ version ".tar.gz"))
+ (sha256
+ (base32
+ "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
+ (build-system gnu-build-system)
+ (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
+ (synopsis "Command line utilities to manipulate HTML and XML files")
+ (description "HTML-XML-utils provides a number of simple utilities for
+manipulating and converting HTML and XML files in various ways. The suite
+consists of the following tools:
+
+@itemize
+ @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;} entities
+ @item @command{xml2asc} convert from @code{&#nnn;} entities to @code{UTF-8}
+ @item @command{hxaddid} add IDs to selected elements
+ @item @command{hxcite} replace bibliographic references by hyperlinks
+ @item @command{hxcite} mkbib - expand references and create bibliography
+ @item @command{hxclean} apply heuristics to correct an HTML file
+ @item @command{hxcopy} copy an HTML file while preserving relative links
+ @item @command{hxcount} count elements and attributes in HTML or XML files
+ @item @command{hxextract} extract selected elements
+ @item @command{hxincl} expand included HTML or XML files
+ @item @command{hxindex} create an alphabetically sorted index
+ @item @command{hxmkbib} create bibliography from a template
+ @item @command{hxmultitoc} create a table of contents for a set of HTML files
+ @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A elements to their parents
+ @item @command{hxnormalize} pretty-print an HTML file
+ @item @command{hxnsxml} convert output of hxxmlns back to normal XML
+ @item @command{hxnum} number section headings in an HTML file
+ @item @command{hxpipe} convert XML to a format easier to parse with Perl or AWK
+ @item @command{hxprintlinks} number links and add table of URLs at end of an HTML file
+ @item @command{hxprune} remove marked elements from an HTML file
+ @item @command{hxref} generate cross-references
+ @item @command{hxselect} extract elements that match a (CSS) selector
+ @item @command{hxtoc} insert a table of contents in an HTML file
+ @item @command{hxuncdata} replace CDATA sections by character entities
+ @item @command{hxunent} replace HTML predefined character entities to @code{UTF-8}
+ @item @command{hxunpipe} convert output of pipe back to XML format
+ @item @command{hxunxmlns} replace \"global names\" by XML Namespace prefixes
+ @item @command{hxwls} list links in an HTML file
+ @item @command{hxxmlns} replace XML Namespace prefixes by \"global names\"
+@end itemize
+")
+ (license license:expat)))
+
(define-public xlsx2csv
(package
(name "xlsx2csv")
Information forwarded
to
guix-patches <at> gnu.org
:
bug#29902
; Package
guix-patches
.
(Sun, 31 Dec 2017 06:31:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 29902 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Stefan !
Thanks for contributing !
I linted your patch and I get
gnu/packages/xml.scm:1120:1: html-xml-utils <at> 7.4: line 1153 is way too long
(96 characters)
Also, I couldn't run
./pre-inst-env guix build --rounds=2 html-xml-utils
it just returns the store item as I had already built it without thinking
:-/
Apart from this, I'd say it's ok
It builds. I didn't try to run any of these commands.
Can you suggest me a command line and a set of html files to test them ?
Well this is just to be super scrupolous, anyway. If you say this works, I
believe you
So, as far as I'm concerned: lgtm !
2017-12-29 22:00 GMT+01:00 Stefan Reichör <stefan <at> xsteve.at>:
> * gnu/packages/xml.scm (html-xml-utils): New variable.
> ---
> gnu/packages/xml.scm | 54 ++++++++++++++++++++++++++++++
> ++++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
> diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
> index 344d7c3..dde1964 100644
> --- a/gnu/packages/xml.scm
> +++ b/gnu/packages/xml.scm
> @@ -1116,6 +1116,60 @@ match and extract data, and elements can be added,
> deleted or modified using
> XSLT and EXSLT.")
> (license license:x11)))
>
> +(define-public html-xml-utils
> + (package
> + (name "html-xml-utils")
> + (version "7.4")
> + (source
> + (origin
> + (method url-fetch)
> + (uri (string-append
> + "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
> + version ".tar.gz"))
> + (sha256
> + (base32
> + "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
> + (build-system gnu-build-system)
> + (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
> + (synopsis "Command line utilities to manipulate HTML and XML files")
> + (description "HTML-XML-utils provides a number of simple utilities for
> +manipulating and converting HTML and XML files in various ways. The suite
> +consists of the following tools:
> +
> +@itemize
> + @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;}
> entities
> + @item @command{xml2asc} convert from @code{&#nnn;} entities to
> @code{UTF-8}
> + @item @command{hxaddid} add IDs to selected elements
> + @item @command{hxcite} replace bibliographic references by hyperlinks
> + @item @command{hxcite} mkbib - expand references and create bibliography
> + @item @command{hxclean} apply heuristics to correct an HTML file
> + @item @command{hxcopy} copy an HTML file while preserving relative links
> + @item @command{hxcount} count elements and attributes in HTML or XML
> files
> + @item @command{hxextract} extract selected elements
> + @item @command{hxincl} expand included HTML or XML files
> + @item @command{hxindex} create an alphabetically sorted index
> + @item @command{hxmkbib} create bibliography from a template
> + @item @command{hxmultitoc} create a table of contents for a set of HTML
> files
> + @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A
> elements to their parents
> + @item @command{hxnormalize} pretty-print an HTML file
> + @item @command{hxnsxml} convert output of hxxmlns back to normal XML
> + @item @command{hxnum} number section headings in an HTML file
> + @item @command{hxpipe} convert XML to a format easier to parse with Perl
> or AWK
> + @item @command{hxprintlinks} number links and add table of URLs at end
> of an HTML file
> + @item @command{hxprune} remove marked elements from an HTML file
> + @item @command{hxref} generate cross-references
> + @item @command{hxselect} extract elements that match a (CSS) selector
> + @item @command{hxtoc} insert a table of contents in an HTML file
> + @item @command{hxuncdata} replace CDATA sections by character entities
> + @item @command{hxunent} replace HTML predefined character entities to
> @code{UTF-8}
> + @item @command{hxunpipe} convert output of pipe back to XML format
> + @item @command{hxunxmlns} replace \"global names\" by XML Namespace
> prefixes
> + @item @command{hxwls} list links in an HTML file
> + @item @command{hxxmlns} replace XML Namespace prefixes by \"global
> names\"
> +@end itemize
> +")
> + (license license:expat)))
> +
> (define-public xlsx2csv
> (package
> (name "xlsx2csv")
>
>
>
>
>
[Message part 2 (text/html, inline)]
Information forwarded
to
guix-patches <at> gnu.org
:
bug#29902
; Package
guix-patches
.
(Sun, 31 Dec 2017 08:24:02 GMT)
Full text and
rfc822 format available.
Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi Catonano!
Thanks for your review.
> Hi Stefan !
>
> Thanks for contributing !
>
> I linted your patch and I get
>
> gnu/packages/xml.scm:1120:1: html-xml-utils <at> 7.4: line 1153 is way too long
> (96 characters)
I fixed this.
> Also, I couldn't run
>
> ./pre-inst-env guix build --rounds=2 html-xml-utils
>
> it just returns the store item as I had already built it without thinking
> :-/
>
> Apart from this, I'd say it's ok
>
> It builds. I didn't try to run any of these commands.
>
> Can you suggest me a command line and a set of html files to test them ?
I am not aware of a lot of documentation with examples for these tools.
Here is some stuff I found on the web:
http://joeferner.github.io/2015/07/15/linux-command-line-html-and-awk/
https://superuser.com/questions/528709/command-line-css-selector-tool
https://www.joyofdata.de/blog/using-linux-shell-web-scraping/
This is a command line that I use to extract links from h2 elements:
cat ~/tmp/document.html | hxnormalize -x | hxselect -i h2 | hxwls
> Well this is just to be super scrupolous, anyway. If you say this works, I
> believe you
>
> So, as far as I'm concerned: lgtm !
Below is the corrected patch (I added the missing copyright line as well)
* gnu/packages/xml.scm (html-xml-utils): New variable.
---
gnu/packages/xml.scm | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/gnu/packages/xml.scm b/gnu/packages/xml.scm
index 344d7c3..548cd1a 100644
--- a/gnu/packages/xml.scm
+++ b/gnu/packages/xml.scm
@@ -18,6 +18,7 @@
;;; Copyright © 2017 Gregor Giesen <giesen <at> zaehlwerk.net>
;;; Copyright © 2017 Alex Vong <alexvong1995 <at> gmail.com>
;;; Copyright © 2017 Petter <petter <at> mykolab.ch>
+;;; Copyright © 2017 Stefan Reichör <stefan <at> xsteve.at>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -1116,6 +1117,61 @@ match and extract data, and elements can be added, deleted or modified using
XSLT and EXSLT.")
(license license:x11)))
+(define-public html-xml-utils
+ (package
+ (name "html-xml-utils")
+ (version "7.4")
+ (source
+ (origin
+ (method url-fetch)
+ (uri (string-append
+ "https://www.w3.org/Tools/HTML-XML-utils/html-xml-utils-"
+ version ".tar.gz"))
+ (sha256
+ (base32
+ "04pgrahsfawnzd9pilvirs05pfdgsd7qwvw4dvkb42rgybhw6h95"))))
+ (build-system gnu-build-system)
+ (home-page "https://www.w3.org/Tools/HTML-XML-utils/")
+ (synopsis "Command line utilities to manipulate HTML and XML files")
+ (description "HTML-XML-utils provides a number of simple utilities for
+manipulating and converting HTML and XML files in various ways. The suite
+consists of the following tools:
+
+@itemize
+ @item @command{asc2xml} convert from @code{UTF-8} to @code{&#nnn;} entities
+ @item @command{xml2asc} convert from @code{&#nnn;} entities to @code{UTF-8}
+ @item @command{hxaddid} add IDs to selected elements
+ @item @command{hxcite} replace bibliographic references by hyperlinks
+ @item @command{hxcite} mkbib - expand references and create bibliography
+ @item @command{hxclean} apply heuristics to correct an HTML file
+ @item @command{hxcopy} copy an HTML file while preserving relative links
+ @item @command{hxcount} count elements and attributes in HTML or XML files
+ @item @command{hxextract} extract selected elements
+ @item @command{hxincl} expand included HTML or XML files
+ @item @command{hxindex} create an alphabetically sorted index
+ @item @command{hxmkbib} create bibliography from a template
+ @item @command{hxmultitoc} create a table of contents for a set of HTML files
+ @item @command{hxname2id} move some @code{ID=} or @code{NAME=} from A elements
+ to their parents
+ @item @command{hxnormalize} pretty-print an HTML file
+ @item @command{hxnsxml} convert output of hxxmlns back to normal XML
+ @item @command{hxnum} number section headings in an HTML file
+ @item @command{hxpipe} convert XML to a format easier to parse with Perl or AWK
+ @item @command{hxprintlinks} number links and add table of URLs at end of an HTML file
+ @item @command{hxprune} remove marked elements from an HTML file
+ @item @command{hxref} generate cross-references
+ @item @command{hxselect} extract elements that match a (CSS) selector
+ @item @command{hxtoc} insert a table of contents in an HTML file
+ @item @command{hxuncdata} replace CDATA sections by character entities
+ @item @command{hxunent} replace HTML predefined character entities to @code{UTF-8}
+ @item @command{hxunpipe} convert output of pipe back to XML format
+ @item @command{hxunxmlns} replace \"global names\" by XML Namespace prefixes
+ @item @command{hxwls} list links in an HTML file
+ @item @command{hxxmlns} replace XML Namespace prefixes by \"global names\"
+@end itemize
+")
+ (license license:expat)))
+
(define-public xlsx2csv
(package
(name "xlsx2csv")
Information forwarded
to
guix-patches <at> gnu.org
:
bug#29902
; Package
guix-patches
.
(Sun, 31 Dec 2017 13:16:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 29902 <at> debbugs.gnu.org (full text, mbox):
Catonano,
Catonano wrote on 31/12/17 at 07:30:
> Also, I couldn't run
>
> ./pre-inst-env guix build --rounds=2 html-xml-utils
>
> it just returns the store item as I had already built it without
> thinking :-/
Been there. ‘guix build’ has a handy ‘--check’ option that solves just
this problem.
Happy times,
T G-R
Information forwarded
to
guix-patches <at> gnu.org
:
bug#29902
; Package
guix-patches
.
(Mon, 01 Jan 2018 14:32:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 29902 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Stefan !
2017-12-31 9:22 GMT+01:00 Stefan Reichör <stefan <at> xsteve.at>:
> > gnu/packages/xml.scm:1120:1: html-xml-utils <at> 7.4: line 1153 is way too
> long
> > (96 characters)
>
> I fixed this.
>
Yes, the linter doesn't report that anymore
> I am not aware of a lot of documentation with examples for these tools.
>
> Here is some stuff I found on the web:
>
> http://joeferner.github.io/2015/07/15/linux-command-line-html-and-awk/
> https://superuser.com/questions/528709/command-line-css-selector-tool
I tried this one and I got the expected result
So, not only html-xml-utils builds, it also runs correctly !
LGTM ! 😊
[Message part 2 (text/html, inline)]
Information forwarded
to
guix-patches <at> gnu.org
:
bug#29902
; Package
guix-patches
.
(Mon, 01 Jan 2018 14:34:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 29902 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
2017-12-31 14:18 GMT+01:00 Tobias Geerinckx-Rice <me <at> tobias.gr>:
> Catonano,
>
> Catonano wrote on 31/12/17 at 07:30:
> > Also, I couldn't run
> >
> > ./pre-inst-env guix build --rounds=2 html-xml-utils
> >
> > it just returns the store item as I had already built it without
> > thinking :-/
>
> Been there. ‘guix build’ has a handy ‘--check’ option that solves just
> this problem.
>
> Happy times,
>
> T G-R
>
Thank you Tobias !
I tried that ! I didn't understand completely the output but I'll keep this
option in mind !
Ciao
[Message part 2 (text/html, inline)]
Reply sent
to
ludo <at> gnu.org (Ludovic Courtès)
:
You have taken responsibility.
(Mon, 08 Jan 2018 09:31:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Stefan Reichör <stefan <at> xsteve.at>
:
bug acknowledged by developer.
(Mon, 08 Jan 2018 09:31:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 29902-done <at> debbugs.gnu.org (full text, mbox):
Stefan Reichör <stefan <at> xsteve.at> skribis:
> Below is the corrected patch (I added the missing copyright line as well)
Applied, thanks!
Ludo’.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 05 Feb 2018 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 7 years and 193 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.