GNU bug report logs -
#61851
[PATCH] gnu: tesseract-ocr-tessdata-fast: Install tesseract config files.
Previous Next
Reported by: jlicht <at> fsfe.org
Date: Mon, 27 Feb 2023 20:56:02 UTC
Severity: normal
Tags: patch
Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #8 received at 61851 <at> debbugs.gnu.org (full text, mbox):
Jelle,
Respectfully, and speaking only as an interested observer, I think this
may not be the right fix.
Guix's Tesseract is indeed missing its config files, causing (among
other things) the examples in the online documentation[0] to not work,
e.g.:
ssouth <at> hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l eng hocr
read_params_file: Can't open hocr
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
(...)
But the root issue appears to be a misconfiguration of the
TESSDATA_PREFIX search path in the tessdata-ocr package, which causes
Tesseract's own config files to be installed in a folder other than the
one it's configured to search.
Fixing this places Tesseract's config files and the trained-data files
together beneath /usr/share/tessdata, allowing Tesseract to work as
expected:
ssouth <at> hamlet ~/tesseract-ocr-test [env]$ tesseract images/eurotext.png - -l eng hocr
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
(...)
This approach has the advantage of keeping the
tesseract-ocr-tessdata-fast package "pure" and focused only on
trained-data files, which will be important for the patch I'm working on
that will split it into multiple packages, one for each language and
script, to allow greater flexibility.
I'll respond to this email with a draft (!) patch to tesseract-ocr that
should achieve the same result as yours, making the config files
available for use. Does this also fix the problem for you? If so,
would you consider submitting this change instead?
--
Simon South
simon <at> simonsouth.net
[0] https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html
This bug report was last modified 2 years and 66 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.