GNU bug report logs - #46422
'pr' screws up tabstops in multicolumn outpt?

Previous Next

Package: coreutils;

Reported by: Leonard Janis Robert König <ljrk <at> ljrk.org>

Date: Wed, 10 Feb 2021 13:21:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Sat, 13 Feb 2021 21:15:56 +0100
Hi,

On 13.02.21 19:29, Leonard Janis Robert König wrote:
> 
> first:  Thank you very much for the work, I really owe you one!

You're welcome. :-)

> On Sat, 2021-02-13 at 17:58 +0100, Erik Auerswald wrote:
>> On 13.02.21 15:17, Erik Auerswald wrote:
>>> On 11.02.21 20:20, Erik Auerswald wrote:
>>>> On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis Robert
>>>> König
>>>> wrote:
>>>>> On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
>>>>>> On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis
>>>>>> Robert
>>>>>> König wrote:
>>>>>>> On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald wrote:
>>>>>>>> On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis
>>>>>>>> Robert
>>>>>>>> König wrote:
>>>>>>>>> I'm sorry if I this is not a bug but to be expected,
>>>>>>>>> but I thnk
>>>>>>>>> pr doesn't get the alignment of tabs in multicolumn
>>>>>>>>> output
>>>>>>>>> right.  [...]  This seems *kind* of related to multi-
>>>>>>>>> column
>>>>>>>>> merged output, as was discussed some years ago here:
>>>>>>>>> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
>>>>>>>>
>>>>>>>> This thread contains the bug-introducing patch in message
>>>>>>>> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html
>>>>>>>>   
>>>>>>>> This is commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
>>>>>>>> That commit removed the 'assume -e' part of the POSIX
>>>>>>>> description
>>>>>>>> of the -COLUMN option from GNU pr.
>>>>>>> [...]
>>>> I have found a fix to the problem described by you.  I am quite
>>>> sure that
>>>> this is not *correct*, but I did not find a way to make
>>>> print_sep_string()
>>>> account for tabs that did not break quite a few existing tests,
>>>> even if
>>>> the merged files problem from 2007 and this columnating bug were
>>>> both
>>>> fixed.  Thus I just tighten the 2007 bug fix to apply in less
>>>> cases.
>>>> This way all existing tests pass, and a new one pertaining to
>>>> this bug
>>>> report passes, too.  I do think this is in the same spirit as the
>>>> "fix"
>>>> from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28).
>>>
>>> I think the attached patch is a better fix than my previous one,
>>> because it applies the special treatment of TAB as separator more
>>> consistently.  It may still not be complete (the code seems quite
>>> convoluted to me) but I do think it improves the situation
>>> significantly, and does not make it worse.
> 
> Hm, I'm not sure whether I understand this special case.  When we have
> a tab as column separator, doesn't this imply that the second column is
> starting on a position n*8, (effectively equivalent to the first
> column), thus guaranteeing that the alignment is honored?  So, if my

Whatever the reason (perhaps conforming to POSIX, perhaps other pr
implementations doing the same), GNU pr implements a special
treatment for TAB as column separator, and the thread from 2007
implies that the pr from HP-UX does as well.

The POSIX spec says:

    "-s[char]

     Separate text columns by the single character char instead of
     by the appropriate number of <space> characters (default for
     char shall be <tab>)."

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html

So use of -s needs to always result in one separator character
between columns.

This is implemented by GNU pr, and seemingly by pr from HP-UX, too
(https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html)

Of all the printable ASCII characters, only TAB results in
interactions with "Tabification," i.e., turning TABs into spaces
on input and spaces into TABs on output.  Thus only TAB as
separator may require the special treatment of disabling
"Tabification."

Omitting this special treatment resulted in the bug from 2007.

Removing the implicit "-e" and "-i" from "-NUMBER" and "-m"
to fix the 2007 bug resulted in this bug (bug#46422), and does
not conform to the POSIX specification nor to the GNU pr info
documentation.

My v3 patch restricts this special treatment of "-s" to just
the cases where it is used without specifying a separator and
thus using the default of TAB, or when it is used with a single
TAB ("-s$'\t'").  Thus it restricts the 2007 change from commit
553d347d3e08e00ee4f9df520b37c964c3f26e28 to affect only those
use cases it should affect, instead of all multi-column use cases.

It may be possible to add some appropriate special treatment for
TAB as separator without disabling "Tabification."  But I do not
know how.  Just accounting for the output position change resulting
from printing a TAB in print_sep_string() does not work, i.e.,
breaks many of the existing tests.

> [...]
> That being said, I don't see this exact distinction reflected in the
> code, so perhaps I just misunderstood.

Disabling "Tabification" only when "-s" was active is missing.  That
resulted in the 2007 bug.  Making the needed special treatment always
used fixed the 2007 bug, but broke your use case.

That some special treatment is needed and intended can be gleaned
from the following comment (with line numbers from pr.c in the
current master branch @ 2de30c7350a77b091afa1eb284acdf082c0f6aa5):

1031  /* It's rather pointless to define a TAB separator with column
1032     alignment */

My patch adds the special treatment, since it works both for the 2007
bug and this bug (bug#46422).

>> It seems to me as if "untabify_input = true;" should be re-introduced
>> in one additional place to fix the regression from commit 553d347,
>> please see the attached patch version 3.
>>
>>> I'd like to ask the GNU Coreutils maintainers to consider merging
>>> the attached patch.
>>
>> The latest version, i.e., v3 for now.
> 
> I can only second this, with the patch my rather obscure (and complex)
> use case of printing thousands of lines of code works properly now!

Thanks for testing!

Thanks,
Erik





This bug report was last modified 4 years and 99 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.