GNU bug report logs - #31949
[PATCH] gnu: Add docx2txt.

Previous Next

Package: guix-patches;

Reported by: Pierre Neidhardt <ambrevar <at> gmail.com>

Date: Sat, 23 Jun 2018 13:33:01 UTC

Severity: normal

Tags: patch

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31949 in the body.
You can then email your comments to 31949 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#31949; Package guix-patches. (Sat, 23 Jun 2018 13:33:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pierre Neidhardt <ambrevar <at> gmail.com>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Sat, 23 Jun 2018 13:33:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pierre Neidhardt <ambrevar <at> gmail.com>
To: guix-patches <at> gnu.org
Subject: [PATCH] gnu: Add docx2txt.
Date: Sat, 23 Jun 2018 15:32:37 +0200
* gnu/packages/textutils.scm (docx2txt): New variable.
---
 gnu/packages/textutils.scm | 65 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/gnu/packages/textutils.scm b/gnu/packages/textutils.scm
index 5734bf62d..8eec045a6 100644
--- a/gnu/packages/textutils.scm
+++ b/gnu/packages/textutils.scm
@@ -14,6 +14,7 @@
 ;;; Copyright © 2017 Kei Kebreau <kkebreau <at> posteo.net>
 ;;; Copyright © 2017 Alex Vong <alexvong1995 <at> gmail.com>
 ;;; Copyright © 2018 Tobias Geerinckx-Rice <me <at> tobias.gr>
+;;; Copyright © 2018 Pierre Neidhardt <ambrevar <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -675,3 +676,67 @@ and Cython.")
 measuring and checking the width of strings, with support east asian text.")
     (home-page "https://github.com/jessevdk/go-flags")
     (license license:expat)))
+
+(define-public docx2txt
+  (package
+    (name "docx2txt")
+    (version "1.4")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "http://downloads.sourceforge.net/docx2txt/docx2txt-"
+                    version ".tgz"))
+              (sha256
+               (base32
+                "06vdikjvpj6qdb41d8wzfnyj44jpnknmlgbhbr1w215420lpb5xj"))))
+    (build-system gnu-build-system)
+    (inputs
+     `(("unzip" ,unzip)
+       ("perl" ,perl)))
+    (arguments
+     `(#:tests? #f                      ; No tests.
+       #:make-flags (list (string-append "BINDIR=" (assoc-ref %outputs "out") "/bin")
+                          (string-append "CONFIGDIR=" (assoc-ref %outputs "out") "/etc")
+                          ;; Makefile seems to be a bit dumb at guessing.
+                          (string-append "INSTALL=install")
+                          (string-append "PERL=perl"))
+       #:phases
+       (modify-phases %standard-phases
+         (delete 'configure)
+         (add-after 'install 'fix-install
+           (lambda* (#:key outputs inputs #:allow-other-keys)
+             (let* ((out (assoc-ref outputs "out"))
+                    (bin (string-append out "/bin"))
+                    (config (string-append out "/etc/docx2txt.config"))
+                    (unzip (assoc-ref inputs "unzip")))
+               ;; According to INSTALL, the .sh wrapper can be skipped.
+               (delete-file (string-append bin "/docx2txt.sh"))
+               (rename-file (string-append bin "/docx2txt.pl")
+                            (string-append bin "/docx2txt"))
+               (substitute* config
+                 (("config_unzip         => '/usr/bin/unzip',")
+                  (string-append "config_unzip         => '"
+                                 unzip
+                                 "/bin/unzip',")))
+               ;; Makefile is wrong.
+               (chmod config #o644)))))))
+    (synopsis "Recover text from .docx files, with good formatting")
+    (description
+     "@command{docx2txt} is a perl based command line utility to convert
+Microsoft Office™ .docx documents to equivalent text documents. Latest version
+supports following features during text extraction.
+
+@itemize
+@item Character conversions (\" ' < & > -, fractions and some mathematical
+symbols, etc.); currency characters are converted to respective names like
+Euro.
+@item Capitalisation of text blocks.
+@item Center and right justification of text fitting in a line of
+(configurable) 80 columns.
+@item Horizontal ruler, line breaks, paragraphs separation, tabs.
+@item Indicating hyperlinked text along with the hyperlink (configurable).
+@item Handling (bullet, decimal, letter, roman) lists along with (attempt at)
+indentation.
+@end itemize\n")
+    (home-page "http://docx2txt.sourceforge.net")
+    (license license:gpl3+)))
-- 
2.17.1





Information forwarded to guix-patches <at> gnu.org:
bug#31949; Package guix-patches. (Mon, 25 Jun 2018 20:59:01 GMT) Full text and rfc822 format available.

Message #8 received at 31949 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Pierre Neidhardt <ambrevar <at> gmail.com>
Cc: 31949 <at> debbugs.gnu.org
Subject: Re: [bug#31949] [PATCH] gnu: Add docx2txt.
Date: Mon, 25 Jun 2018 22:58:29 +0200
Hi,

Pierre Neidhardt <ambrevar <at> gmail.com> skribis:

> * gnu/packages/textutils.scm (docx2txt): New variable.

[...]

> +    (source (origin
> +              (method url-fetch)
> +              (uri (string-append
> +                    "http://downloads.sourceforge.net/docx2txt/docx2txt-"
> +                    version ".tgz"))

Could you use mirror://sourceforge?

> +       #:make-flags (list (string-append "BINDIR=" (assoc-ref %outputs "out") "/bin")
> +                          (string-append "CONFIGDIR=" (assoc-ref %outputs "out") "/etc")

Lines are a bit long.  :-)

> +    (synopsis "Recover text from .docx files, with good formatting")

@file{.docx} please.

> +    (description
> +     "@command{docx2txt} is a perl based command line utility to convert

s/perl/Perl/

> +Microsoft Office™ .docx documents to equivalent text documents. Latest version

No need for the trademark sign; two spaces after period.

> +@itemize
> +@item Character conversions (\" ' < & > -, fractions and some mathematical
> +symbols, etc.); currency characters are converted to respective names like
> +Euro.

Maybe you remove what’s in parentheses?  Or use @code.

Could you send an updated patch?  Make sure ‘guix lint’ is happy.  :-)

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#31949; Package guix-patches. (Mon, 25 Jun 2018 21:23:02 GMT) Full text and rfc822 format available.

Message #11 received at 31949 <at> debbugs.gnu.org (full text, mbox):

From: Pierre Neidhardt <ambrevar <at> gmail.com>
To: 31949 <at> debbugs.gnu.org
Subject: [PATCH] gnu: Add docx2txt.
Date: Mon, 25 Jun 2018 23:22:32 +0200
* gnu/packages/textutils.scm (docx2txt): New variable.
---
 gnu/packages/textutils.scm | 66 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/gnu/packages/textutils.scm b/gnu/packages/textutils.scm
index 5734bf62d..5dec41428 100644
--- a/gnu/packages/textutils.scm
+++ b/gnu/packages/textutils.scm
@@ -14,6 +14,7 @@
 ;;; Copyright © 2017 Kei Kebreau <kkebreau <at> posteo.net>
 ;;; Copyright © 2017 Alex Vong <alexvong1995 <at> gmail.com>
 ;;; Copyright © 2018 Tobias Geerinckx-Rice <me <at> tobias.gr>
+;;; Copyright © 2018 Pierre Neidhardt <ambrevar <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -675,3 +676,68 @@ and Cython.")
 measuring and checking the width of strings, with support east asian text.")
     (home-page "https://github.com/jessevdk/go-flags")
     (license license:expat)))
+
+(define-public docx2txt
+  (package
+    (name "docx2txt")
+    (version "1.4")
+    (source (origin
+              (method url-fetch)
+              (uri (string-append
+                    "mirror://sourceforge/docx2txt/docx2txt/v"
+                    version "/docx2txt-" version ".tgz"))
+              (sha256
+               (base32
+                "06vdikjvpj6qdb41d8wzfnyj44jpnknmlgbhbr1w215420lpb5xj"))))
+    (build-system gnu-build-system)
+    (inputs
+     `(("unzip" ,unzip)
+       ("perl" ,perl)))
+    (arguments
+     `(#:tests? #f                      ; No tests.
+       #:make-flags (list (string-append "BINDIR="
+                                         (assoc-ref %outputs "out") "/bin")
+                          (string-append "CONFIGDIR="
+                                         (assoc-ref %outputs "out") "/etc")
+                          ;; Makefile seems to be a bit dumb at guessing.
+                          (string-append "INSTALL=install")
+                          (string-append "PERL=perl"))
+       #:phases
+       (modify-phases %standard-phases
+         (delete 'configure)
+         (add-after 'install 'fix-install
+           (lambda* (#:key outputs inputs #:allow-other-keys)
+             (let* ((out (assoc-ref outputs "out"))
+                    (bin (string-append out "/bin"))
+                    (config (string-append out "/etc/docx2txt.config"))
+                    (unzip (assoc-ref inputs "unzip")))
+               ;; According to INSTALL, the .sh wrapper can be skipped.
+               (delete-file (string-append bin "/docx2txt.sh"))
+               (rename-file (string-append bin "/docx2txt.pl")
+                            (string-append bin "/docx2txt"))
+               (substitute* config
+                 (("config_unzip         => '/usr/bin/unzip',")
+                  (string-append "config_unzip         => '"
+                                 unzip
+                                 "/bin/unzip',")))
+               ;; Makefile is wrong.
+               (chmod config #o644)))))))
+    (synopsis "Recover text from @file{.docx} files, with good formatting")
+    (description
+     "@command{docx2txt} is a Perl based command line utility to convert
+Microsoft Office @file{.docx} documents to equivalent text documents.  Latest
+version supports following features during text extraction.
+
+@itemize
+@item Character conversions; currency characters are converted to respective
+names like Euro.
+@item Capitalisation of text blocks.
+@item Center and right justification of text fitting in a line of
+(configurable) 80 columns.
+@item Horizontal ruler, line breaks, paragraphs separation, tabs.
+@item Indicating hyperlinked text along with the hyperlink (configurable).
+@item Handling (bullet, decimal, letter, roman) lists along with (attempt at)
+indentation.
+@end itemize\n")
+    (home-page "http://docx2txt.sourceforge.net")
+    (license license:gpl3+)))
-- 
2.17.1





Reply sent to ludo <at> gnu.org (Ludovic Courtès):
You have taken responsibility. (Sat, 07 Jul 2018 15:54:01 GMT) Full text and rfc822 format available.

Notification sent to Pierre Neidhardt <ambrevar <at> gmail.com>:
bug acknowledged by developer. (Sat, 07 Jul 2018 15:54:02 GMT) Full text and rfc822 format available.

Message #16 received at 31949-done <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Pierre Neidhardt <ambrevar <at> gmail.com>
Cc: 31949-done <at> debbugs.gnu.org
Subject: Re: [bug#31949] [PATCH] gnu: Add docx2txt.
Date: Sat, 07 Jul 2018 17:53:28 +0200
Hello Pierre,

Pierre Neidhardt <ambrevar <at> gmail.com> skribis:

> * gnu/packages/textutils.scm (docx2txt): New variable.

Perfect.  Applied, thanks!

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 05 Aug 2018 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 16 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.