GNU bug report logs -
#23647
25.1.50; In man pages, links on hyphenated words don't work
Previous Next
Reported by: Stephen Berman <stephen.berman <at> gmx.net>
Date: Sun, 29 May 2016 09:53:01 UTC
Severity: minor
Tags: patch
Found in version 25.1.50
Done: Stephen Berman <stephen.berman <at> gmx.net>
Bug is archived. No further changes may be made.
Full log
Message #17 received at 23647 <at> debbugs.gnu.org (full text, mbox):
On Mon, 30 May 2016 03:22:58 +0300 Eli Zaretskii <eliz <at> gnu.org> wrote:
>> From: Stephen Berman <stephen.berman <at> gmx.net>
>> Cc: 23647 <at> debbugs.gnu.org
>> Date: Mon, 30 May 2016 01:09:21 +0200
>>
>> > Is it only the ASCII hyphen/minus, or could there be other characters
>> > (e.g., if Groff/troff are invoked with some exotic -Tfoo switch)?
>>
>> That possibility didn't occur to me but according to Wikipedia, groff
>> also outputs soft hyphens (octal 255) and indeed I see that the function
>> Man-build-references-alist, which also removes hyphenation (in a more
>> complicated way that doesn't seem to be needed in the present case),
>> also takes the soft hyphen into account. That can be done here too by
>> changing the above string-match regexp to "[-]". If someone knows of
>> other possibilities allowed by [gt]roff, maybe the regexp could be
>> further extended, or the condition reformulated as required. What do
>> you think?
>
> I'm not enough of a roff expert to tell, but how about asking on the
> Groff list?
I did that and got this feedback from Steffen Nurpmeso:
> I have been convinced that soft hyphen is a control character and
> not something visual, it should be used as a «break-indicator»
> rather than as a hyphenation character, interpretation of which is
> left as an excercise for the processing software. I have no idea
> still but would guess groff uses "hyphen minus" U+002D or hyphen
> U+2010 if Unicode is possible.
In a followup to another response he added:
> For display purposes however i think U+00AD can't be used
> directly, but will be replaced by the renderer to either nothing,
> if no wrap is to be applied at the character position, or
> something appropriate, like ASCII hyphen-minus or some extended
> Unicode "Pd" letter, of which there are some (e.g., U+058A
> ARMENIAN HYPHEN, U+1400 CANADIAN SYLLABICS HYPHEN, and more).
And he also made this suggestion:
> Eli Zaretskii is so active on the
> Unicode list, why don't you use the Pd character class for
> detecting «hyphen»? I guess this should cover all such things
> already as of today, thanks to Werner Lemberg?!
So how should we proceed from here? We could add U+2010 to the regexp
in my patch, which would then be this: "[-‐]" (hyphen-minus (ASCII 45),
hyphen (U+2010), soft hyphen (U+00AD) -- it seems harmless to retain the
latter, given that man.el already uses it elsewhere), but if these are
all included in the Unicode Pd character class along with other possible
hyphen characters, maybe a different approach is required. I know
nothing about the Pd character class and how to detect it with Elisp; I
also don't know if doing that would lead to further changes in man.el,
making this a larger undertaking. What do you suggest?
Steve Berman
This bug report was last modified 8 years and 355 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.