GNU bug report logs - #33844
Rename ghc-pandoc to pandoc

Previous Next

Package: guix;

Reported by: swedebugia <at> riseup.net

Date: Sun, 23 Dec 2018 08:47:02 UTC

Severity: normal

Done: zimoun <zimon.toutoune <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: zimoun <zimon.toutoune <at> gmail.com>
To: Mike Gerwitz <mtg <at> gnu.org>
Cc: Guix Devel <guix-devel <at> gnu.org>, 33844 <at> debbugs.gnu.org, Efraim Flashner <efraim <at> flashner.co.il>
Subject: bug#33844: Rename ghc-pandoc to pandoc
Date: Thu, 27 Feb 2020 14:10:15 +0100
Hi Mike,

On Thu, 27 Feb 2020 at 02:23, Mike Gerwitz <mtg <at> gnu.org> wrote:

> Ah, for the record, I had searched for pandoc using `guix package -s
> pandoc` in the past and didn't find what I was looking for, and so fell
> back to a Debian system.  It turns out what I wanted was ghc-pandoc
> after all.

Thank you for pointing the issue.

My remark is *not* about the rename which seems fine. For the very
same reason than the "git-annex" software is named 'git-annex' and not
'ghc-git-annex'.


Well, your comment is pointing: a) that the description is badly
written and b) the 'relevance' score is too rough.

The command "guix search pandoc" returns as the highest ranked
package: ghc-pandoc-citeproc with the relevance score of 17. The
package of interest 'ghc-pandoc' appears at the 6th position with a
relevance score of 8. (And after emacs-pandoc-mode, ghc-pandoc-types,
emacs-ox-pandoc and python-pandocfilters; well less relevant packages,
IMO.)
Why? Because the number of occurrences of the term 'pandoc' in
synopsis+description+name.
ghc-pandoc-citeproc: 1+5+1
ghc-pandoc: 0+2+1

To be precise, the score uses weights and so it reads:

ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17
ghc-pandoc: 3*0 + 2*2 + 4*1 = 8

And the rename bumps the score because there is an additional weight
(5) for exact match (which normally happens only for the 'name'
field).

ghc-pandoc-citeproc: 3*1 + 2*5 + 4*1 = 17
pandoc: 3*0 + 2*2 + 4*1*5 = 24

It apparently fixes the issue and now the package named 'pandoc' will
show up first. But it is an artefact because it is easy* to find other
weights that invalidate this expected ranking; and the current weights
are a working rule of thumbs but not deeply thought, AFAIK.


*For example instead of 5, let choose 2, then the score becomes:
3*0+2*2+4*1*2=12 which is less than 17. Well, not so easy because 2 is
the same as 'description' and it seems less natural; i.e., it appears
more natural to have a high weight for an exact match. But the point
is: it is possible to find another working rule of thumb which will
not return the expected result for all the packages.


The real problem is not the non-obvious name (ghc-pandoc instead of
simply pandoc) but it is: a) some descriptions are badly written and
b) the 'relevance' scoring function is not enough "smart" to detect
them.



All the best,
simon




This bug report was last modified 4 years and 149 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.