GNU bug report logs - #46422
'pr' screws up tabstops in multicolumn outpt?

Previous Next

Package: coreutils;

Reported by: Leonard Janis Robert König <ljrk <at> ljrk.org>

Date: Wed, 10 Feb 2021 13:21:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 46422 in the body.
You can then email your comments to 46422 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Wed, 10 Feb 2021 13:21:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Leonard Janis Robert König <ljrk <at> ljrk.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 10 Feb 2021 13:21:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Leonard Janis Robert König <ljrk <at> ljrk.org>
To: bug-coreutils <at> gnu.org
Subject: 'pr' screws up tabstops in multicolumn outpt?
Date: Wed, 10 Feb 2021 13:42:29 +0100
I'm sorry if I this is not a bug but to be expected, but I thnk pr
doesn't get the alignment of tabs in multicolumn output right.

Consider the following test input, where everything from x->x is a tab
(with tabs 8):

123456781234567812345678123456781
x	x	x	x	x
123456781234567812345678123456781
x	x	x	x	x

Run it through multicolumn pr, e.g.,

    pr -t -2 test > out

The output looks like:

123456781234567812345678123456781   123456781234567812345678123456781
x	x	x	x	x   x	x	x	x	x

That is, the x's aren't aligned anymore.  In contrast, on a SunOS 5.10
machine, I get:

123456781234567812345678123456781   123456781234567812345678123456781
        x       x       x       x           x       x       x       x

Basically, SunOS pr notices, that it cannot print "\tx\tx\tx\tx"
anymore, since the separation between the pages messed that up. 
Instead it prints "\t     x\t     x\t     x\t     x".

This bug only occurs with certain page widths, obviously.  A first
workaround can be to use `-s` to separate both columns by tabs. 
Unfortunately, this only works as long as the line lengths actually
allow for the next column to start at a multiple of 8.  E.g., if I pass
`-w 132` as well, this won't work again -- again, SunOS does the
expected thing here.

This seems *kind* of related to multi-column merged output, as was
discussed some years ago here:
https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html

Unfortunately the POSIX spec is, in my reading, a bit unclear here. 
But I think the behavior of GNU/pr is rather unexpected when printing
multicolumn source code and not in line what the original authors
intended.

The outline of the fix would be to calculate the starting column
position and then re-tab.  In my case, I could workaround with the
following for now:

    $ pr -e -t test | pr -t -2 > out

What do you think?

~leo



$ pr --version
pr (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Pete TerMaat and Roland Huebner.
$ uname -a
Linux hoopyfrood 5.10.14-arch1-1 #1 SMP PREEMPT Sun, 07 Feb 2021
22:42:17 +0000 x86_64 GNU/Linux




















































Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Wed, 10 Feb 2021 15:45:01 GMT) Full text and rfc822 format available.

Message #8 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Wed, 10 Feb 2021 16:44:14 +0100
Hi,

On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert König wrote:
> I'm sorry if I this is not a bug but to be expected, but I thnk pr
> doesn't get the alignment of tabs in multicolumn output right.
> 
> Consider the following test input, where everything from x->x is a tab
> (with tabs 8):

Email quoting disturbs the alignment with tabs, thus I omit those
examples.

> Run it through multicolumn pr, e.g.,
> 
>     pr -t -2 test > out
> 
> The output looks [garbled.]
> [...]
> In contrast, on a SunOS 5.10 machine, I get:
> 
> 123456781234567812345678123456781   123456781234567812345678123456781
>         x       x       x       x           x       x       x       x

This is lacking the first 'x', did you use a different input file?

> Basically, SunOS pr notices, that it cannot print "\tx\tx\tx\tx"
> anymore, since the separation between the pages messed that up.
> Instead it prints "\t     x\t     x\t     x\t     x".

You can work around the issue by using "expand" to change tabs into
spaces before using "pr":

$ expand test | pr -t -2
123456781234567812345678123456781   123456781234567812345678123456781
x	x	x	x	x   x	    x	    x	    x	    x

> [...]
> This seems *kind* of related to multi-column merged
> output, as was discussed some years ago here:
> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html

Just keeping tabs for the second column cannot always work.

> [...]
> What do you think?

It seems to me the approach of "expand"ing the tabs to spaces before using
"pr" is the most general.

Thanks,
Erik
-- 
Be water, my friend.
                        -- Bruce Lee




Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Wed, 10 Feb 2021 16:07:01 GMT) Full text and rfc822 format available.

Message #11 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Wed, 10 Feb 2021 17:06:49 +0100
Hi,

On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert König wrote:
> I'm sorry if I this is not a bug but to be expected, but I thnk pr
> doesn't get the alignment of tabs in multicolumn output right.
> [...]
> Unfortunately the POSIX spec is, in my reading, a bit unclear here. 

I do not think so:

-column
  Produce multi-column output that is arranged in column columns[...].
  The options -e and -i shall be assumed for multiple text-column output.

-e[char][gap]
  Expand each input <tab> to the next greater column position[...].

-i[char][gap]
  In output, replace multiple <space>s with <tab>s wherever[...].

https://pubs.opengroup.org/onlinepubs/009695399/utilities/pr.html
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html

The way I read the POSIX spec, "pr" needs to account for the column
separation by adjusting whitespace while still using tab characters in
the output.

> But I think the behavior of GNU/pr is rather unexpected when printing
> multicolumn source code and not in line what the original authors
> intended.

I concur.

Thanks,
Erik
-- 
[M]ost parts of this industry just work by chance.
                        -- Thomas Gleixner




Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Wed, 10 Feb 2021 16:39:02 GMT) Full text and rfc822 format available.

Message #14 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Leonard Janis Robert König <ljrk <at> ljrk.org>
To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Wed, 10 Feb 2021 17:10:23 +0100
Hi,

On Wed, 2021-02-10 at 16:44 +0100, Erik Auerswald wrote:
> Hi,
> 
> On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert König
> wrote:
> > I'm sorry if I this is not a bug but to be expected, but I thnk pr
> > doesn't get the alignment of tabs in multicolumn output right.
> > 
> > Consider the following test input, where everything from x->x is a
> > tab
> > (with tabs 8):
> 
> Email quoting disturbs the alignment with tabs, thus I omit those
> examples.
> 
> > Run it through multicolumn pr, e.g.,
> > 
> >     pr -t -2 test > out
> > 
> > The output looks [garbled.]
> > [...]
> > In contrast, on a SunOS 5.10 machine, I get:
> > 
> > 123456781234567812345678123456781  
> > 123456781234567812345678123456781
> >         x       x       x       x           x       x       x      
> > x
> 
> This is lacking the first 'x', did you use a different input file?

Yes, my bad, I toyed around a bit.  The result, with the same input
file, is as I described:

123456781234567812345678123456781   123456781234567812345678123456781
x       x       x       x       x   x       x       x       x       x

> 
> > Basically, SunOS pr notices, that it cannot print "\tx\tx\tx\tx"
> > anymore, since the separation between the pages messed that up.
> > Instead it prints "\t     x\t     x\t     x\t     x".
> 
> You can work around the issue by using "expand" to change tabs into
> spaces before using "pr":
> 
> $ expand test | pr -t -2
> 123456781234567812345678123456781   123456781234567812345678123456781
> x       x       x       x       x   x       x       x       x       x
> 
> > [...]
> > This seems *kind* of related to multi-column merged
> > output, as was discussed some years ago here:
> >    
> > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
> 
> Just keeping tabs for the second column cannot always work.

But that seems to be what GNU pr tries to do.

> > [...]
> > What do you think?
> 
> It seems to me the approach of "expand"ing the tabs to spaces before
> using
> "pr" is the most general

It is indeed the most general approach, but this would be quite a
caveat for anyone who prints tabbed source code in a multicolumn
format, where the columns do not happen to be starting at n*8
characters.  I could confirm that FreeBSD does mirror the SunOS
behavior as well.

~ leo






Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Wed, 10 Feb 2021 18:33:02 GMT) Full text and rfc822 format available.

Message #17 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Wed, 10 Feb 2021 19:31:49 +0100
On Wed, 2021-02-10 at 17:06 +0100, Erik Auerswald wrote:
> Hi,
> 
> On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert König
> wrote:
> > I'm sorry if I this is not a bug but to be expected, but I thnk pr
> > doesn't get the alignment of tabs in multicolumn output right.
> > [...]
> > Unfortunately the POSIX spec is, in my reading, a bit unclear here.
> 
> I do not think so:
> 
> -column
>   Produce multi-column output that is arranged in column
> columns[...].
>   The options -e and -i shall be assumed for multiple text-column
> output.
> 
> -e[char][gap]
>   Expand each input <tab> to the next greater column position[...].
> 
> -i[char][gap]
>   In output, replace multiple <space>s with <tab>s wherever[...].
> 
> https://pubs.opengroup.org/onlinepubs/009695399/utilities/pr.html
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html
> 
> The way I read the POSIX spec, "pr" needs to account for the column
> separation by adjusting whitespace while still using tab characters
> in
> the output.

What's unclear however, whether this adjusting whitespace (replacing
space by tab, as is the default) should happen on input or after
arranging the text into columns.  The former produces something
resembling the GNU output as the tabs in the input files are kept,
despite not aligning to 8 characters anymore.  The latter produces the
SunOS/FreeBSD output, as first the text is arranged and then tabs
recalculated.

> > But I think the behavior of GNU/pr is rather unexpected when
> > printing
> > multicolumn source code and not in line what the original authors
> > intended.
> 
> I concur.

This makes tabs as input while using column or merge basically unusable
though, as the characters are completely misaligned.  This is
especially bad for code.  My use case is a script that prints the UNIX
v6 source code in fashion of John Lion (see v6.cuzoco.com) and it
breaks under GNU pr, but not on BSD, Solaris.  And it seems to have
worked also on Amdahl UTS in the late 80s for that matter.  

With regards to the original intent, I just tested this on Research
UNIX v6 with SIMH
(https://gunkies.org/wiki/Installing_Unix_v6_(PDP-11)_on_SIMH), and
indeed the output is like the SunOS and FreeBSD version do:

# pr -t -2 foo
123456781234567812345678123456781   123456781234567812345678123456781
x       x       x       x       x   x       x       x       x       x

N.B. that the input file must be little longer since the original
version didn't produce identical vertical lengths as is allowed as per
the standard:

> Whether or not text  columns  are produced with identical vertical
> lengths is unspecified [...] .

At least to me this behavior of GNU pr came as a big surprise, maybe
this should be described in the FAQ/Gotchas or the BUGS section of the
manual?

~ leo





Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Thu, 11 Feb 2021 12:01:01 GMT) Full text and rfc822 format available.

Message #20 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Thu, 11 Feb 2021 13:00:27 +0100
Hi,

On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert König wrote:
> I'm sorry if I this is not a bug but to be expected, but I thnk pr
> doesn't get the alignment of tabs in multicolumn output right.
> [...]
> This seems *kind* of related to multi-column merged output, as was
> discussed some years ago here:
> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html

This thread contains the bug-introducing patch in message
https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html

This is commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.

That commit removed the 'assume -e' part of the POSIX description of
the -COLUMN option from GNU pr.

Reverting this patch (i.e., adding the one deleted line back to pr.c)
fixes *this* bug, but then re-introduces the bug reported in 2007, i.e.,
sub-test 'merge-w-tabs' of test 'pr-tests.pl' fails.

Thanks,
Erik
-- 
If it ain't broke, don't fix it.




Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Thu, 11 Feb 2021 17:10:01 GMT) Full text and rfc822 format available.

Message #23 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Leonard Janis Robert König <ljrk <at> ljrk.org>
To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Thu, 11 Feb 2021 18:09:28 +0100
Hi,

On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
> Hi,
> 
> On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis Robert König
> wrote:
> > On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald wrote:
> > > On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert
> > > König
> > > wrote:
> > > > I'm sorry if I this is not a bug but to be expected, but I thnk
> > > > pr
> > > > doesn't get the alignment of tabs in multicolumn output right.
> > > > [...]
> > > > This seems *kind* of related to multi-column merged output, as
> > > > was
> > > > discussed some years ago here:
> > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
> > > 
> > > This thread contains the bug-introducing patch in message
> > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html
> > > 
> > > This is commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
> > 
> > ah, thanks for digging, I read the message but must have missed
> > the patch.
> > 
> > > That commit removed the 'assume -e' part of the POSIX description
> > > of the -COLUMN option from GNU pr.
> > 
> > Which is definitely against POSIX yeah.  The problem however, is
> > not
> > POSIX, but that the test case would succeed on a POSIX confirming
> > implementation, despite -e being assumed.  The process should be:
> > 
> > 1. read in while untabify input (-i)
> > 2. write out while (re-)tabify output (-e)
> 
> The options are the other way around: -e expands tabs while reading
> input,
> -i unexpands spaces on output.

Ah, yes, but the same thought applies.

> Your test case requires expanding tabs
> during input, which is the reason that "expand | pr" could be used as
> a
> workaround (with "expand | pr | unexpand", pr would not need to mess
> with
> tabs at all, but I do think that GNU pr is currently buggy and should
> be fixed).

Absolutely, expand would be a workaround (I happen to use `pr -e | pr`
in my script, for other reasons).

I've looked a bit further through the code but there's hardly a single
place that needs to be touched in order to not introduce other bugs
again.  For now I can only put it on my to-do list to fix, but no idea
when I get around doing it.

~leo

> P.S. Feel free to re-add the bug tracker / list email address if you
> like.

P.P.S:  Yeah, I should configure my mailer to warn if I don't answer
everyone.






Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Thu, 11 Feb 2021 19:21:02 GMT) Full text and rfc822 format available.

Message #26 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Thu, 11 Feb 2021 20:20:05 +0100
Hi,

On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis Robert König wrote:
> On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
> > On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis Robert
> > König wrote:
> > > On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald wrote:
> > > > On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert
> > > > König wrote:
> > > > > I'm sorry if I this is not a bug but to be expected, but I thnk
> > > > > pr doesn't get the alignment of tabs in multicolumn output
> > > > > right.  [...]  This seems *kind* of related to multi-column
> > > > > merged output, as was discussed some years ago here:
> > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
> > > > 
> > > > This thread contains the bug-introducing patch in message
> > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html
> > > > 
> > > > This is commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
> > > 
> > > ah, thanks for digging, I read the message but must have missed
> > > the patch.
> > > 
> > > > That commit removed the 'assume -e' part of the POSIX description
> > > > of the -COLUMN option from GNU pr.
> > > [...]
> > Your test case requires expanding tabs during input, which is
> > the reason that "expand | pr" could be used as a workaround (with
> > "expand | pr | unexpand", pr would not need to mess with tabs at all,
> > but I do think that GNU pr is currently buggy and should be fixed).
> 
> Absolutely, expand would be a workaround (I happen to use `pr -e | pr`
> in my script, for other reasons).
> 
> I've looked a bit further through the code but there's hardly a single
> place that needs to be touched in order to not introduce other bugs
> again.  For now I can only put it on my to-do list to fix, but no idea
> when I get around doing it.

I have found a fix to the problem described by you.  I am quite sure that
this is not *correct*, but I did not find a way to make print_sep_string()
account for tabs that did not break quite a few existing tests, even if
the merged files problem from 2007 and this columnating bug were both
fixed.  Thus I just tighten the 2007 bug fix to apply in less cases.
This way all existing tests pass, and a new one pertaining to this bug
report passes, too.  I do think this is in the same spirit as the "fix"
from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28).

See the following inline patch:

--------8<--------
diff --git a/src/pr.c b/src/pr.c
index 22d032ba3..ad1e36769 100644
--- a/src/pr.c
+++ b/src/pr.c
@@ -1237,6 +1237,8 @@ init_parameters (int number_of_files)
         col_sep_string = column_separator;
 
       truncate_lines = true;
+      if (!parallel_files)
+        untabify_input = true;
       tabify_output = true;
     }
   else
diff --git a/tests/pr/pr-tests.pl b/tests/pr/pr-tests.pl
index b7d868cf8..0894d3804 100755
--- a/tests/pr/pr-tests.pl
+++ b/tests/pr/pr-tests.pl
@@ -474,6 +474,12 @@ push @Tests,
     {IN=>{2=>"a\n"}},
      {OUT=>"a\t\t\t\t  \t\t\ta\n"} ];
 
+# Exercise a bug with pr -t -2 (bug #46422)
+push @Tests,
+   ['mcol-w-tabs', '-t -2',
+    {IN=>"x\tx\tx\tx\tx\nx\tx\tx\tx\tx\n"},
+     {OUT=>"x\tx\tx\tx\tx   x\t    x\t    x\t    x\t    x\n"} ];
+
 @Tests = triple_test \@Tests;
 
 my $save_temps = $ENV{DEBUG};
-------->8--------

It is up to the GNU Coreutils maintainers if they want to add this
additional band-aid to the interesting 'pr' code or not.  Adherents to
test-driven development would probably like this approach.  ;-)

Thanks,
Erik
-- 
Bugs are like mushrooms - found one, look around for more...
                        -- Al Viro




Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Sat, 13 Feb 2021 14:18:02 GMT) Full text and rfc822 format available.

Message #29 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: [PATCH] Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt?
Date: Sat, 13 Feb 2021 15:17:08 +0100
[Message part 1 (text/plain, inline)]
On 11.02.21 20:20, Erik Auerswald wrote:
> On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis Robert König wrote:
>> On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
>>> On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis Robert
>>> König wrote:
>>>> On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald wrote:
>>>>> On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert
>>>>> König wrote:
>>>>>> I'm sorry if I this is not a bug but to be expected, but I thnk
>>>>>> pr doesn't get the alignment of tabs in multicolumn output
>>>>>> right.  [...]  This seems *kind* of related to multi-column
>>>>>> merged output, as was discussed some years ago here:
>>>>>> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
>>>>>
>>>>> This thread contains the bug-introducing patch in message
>>>>> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html
>>>>>
>>>>> This is commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
>>>>
>>>> ah, thanks for digging, I read the message but must have missed
>>>> the patch.
>>>>
>>>>> That commit removed the 'assume -e' part of the POSIX description
>>>>> of the -COLUMN option from GNU pr.
>>>> [...]
>>> Your test case requires expanding tabs during input, which is
>>> the reason that "expand | pr" could be used as a workaround (with
>>> "expand | pr | unexpand", pr would not need to mess with tabs at all,
>>> but I do think that GNU pr is currently buggy and should be fixed).
>>
>> Absolutely, expand would be a workaround (I happen to use `pr -e | pr`
>> in my script, for other reasons).
>> [...]
> I have found a fix to the problem described by you.  I am quite sure that
> this is not *correct*, but I did not find a way to make print_sep_string()
> account for tabs that did not break quite a few existing tests, even if
> the merged files problem from 2007 and this columnating bug were both
> fixed.  Thus I just tighten the 2007 bug fix to apply in less cases.
> This way all existing tests pass, and a new one pertaining to this bug
> report passes, too.  I do think this is in the same spirit as the "fix"
> from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28).

I think the attached patch is a better fix than my previous one,
because it applies the special treatment of TAB as separator more
consistently.  It may still not be complete (the code seems quite
convoluted to me) but I do think it improves the situation
significantly, and does not make it worse.

The code does not try to create equal width columns when using a
TAB as column separator.  This is made clear through comments:

1018 /* Tabification is assumed for multiple columns. */
...
1031     /* It's rather pointless to define a TAB separator with column
1032        alignment */

Thus the intent of the code seems to be follow the general idea
of using equal width columns by "assuming Tabification," i.e.,
working as if the options -e and -i were given, as specified by
POSIX, unless the column separator has been changed to a TAB.
The attached patch results in following through with this in more
cases, fixing this bug (bug#46422) without introducing known
regressions.

The patch adds more test cases.  One identical to the new test
from my previous patch, another generalizes the case from 2007
to use '-2 -s' to trigger special treatment with TAB as separator.

Creating three column output as done in the bug report from 2007
automatically aligns the columns with the default TAB stops of pr,
thus the patch adds another variant of the 2007 case merging two
files.  Merging files (-m) is done with a slightly different code
path from -NUMBER, while both create columns.

I'd like to ask the GNU Coreutils maintainers to consider merging
the attached patch.

Thanks,
Erik
[coreutils-pr-fix_bug_46422.v2.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Sat, 13 Feb 2021 16:59:02 GMT) Full text and rfc822 format available.

Message #32 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Sat, 13 Feb 2021 17:58:33 +0100
[Message part 1 (text/plain, inline)]
Hi,

On 13.02.21 15:17, Erik Auerswald wrote:
> On 11.02.21 20:20, Erik Auerswald wrote:
>> On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis Robert König 
>> wrote:
>>> On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
>>>> On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis Robert
>>>> König wrote:
>>>>> On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald wrote:
>>>>>> On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis Robert
>>>>>> König wrote:
>>>>>>> I'm sorry if I this is not a bug but to be expected, but I thnk
>>>>>>> pr doesn't get the alignment of tabs in multicolumn output
>>>>>>> right.  [...]  This seems *kind* of related to multi-column
>>>>>>> merged output, as was discussed some years ago here:
>>>>>>> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html 
>>>>>>>
>>>>>>
>>>>>> This thread contains the bug-introducing patch in message
>>>>>> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html 
>>>>>>
>>>>>>
>>>>>> This is commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
>>>>>
>>>>> ah, thanks for digging, I read the message but must have missed
>>>>> the patch.
>>>>>
>>>>>> That commit removed the 'assume -e' part of the POSIX description
>>>>>> of the -COLUMN option from GNU pr.
>>>>> [...]
>>>> Your test case requires expanding tabs during input, which is
>>>> the reason that "expand | pr" could be used as a workaround (with
>>>> "expand | pr | unexpand", pr would not need to mess with tabs at all,
>>>> but I do think that GNU pr is currently buggy and should be fixed).
>>>
>>> Absolutely, expand would be a workaround (I happen to use `pr -e | pr`
>>> in my script, for other reasons).
>>> [...]
>> I have found a fix to the problem described by you.  I am quite sure that
>> this is not *correct*, but I did not find a way to make 
>> print_sep_string()
>> account for tabs that did not break quite a few existing tests, even if
>> the merged files problem from 2007 and this columnating bug were both
>> fixed.  Thus I just tighten the 2007 bug fix to apply in less cases.
>> This way all existing tests pass, and a new one pertaining to this bug
>> report passes, too.  I do think this is in the same spirit as the "fix"
>> from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28).
> 
> I think the attached patch is a better fix than my previous one,
> because it applies the special treatment of TAB as separator more
> consistently.  It may still not be complete (the code seems quite
> convoluted to me) but I do think it improves the situation
> significantly, and does not make it worse.

It seems to me as if "untabify_input = true;" should be re-introduced
in one additional place to fix the regression from commit 553d347,
please see the attached patch version 3.

> I'd like to ask the GNU Coreutils maintainers to consider merging
> the attached patch.

The latest version, i.e., v3 for now.

Thanks,
Erik
[coreutils-pr-fix_bug_46422.v3.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Sat, 13 Feb 2021 18:30:02 GMT) Full text and rfc822 format available.

Message #35 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Leonard Janis Robert König <ljrk <at> ljrk.org>
To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Sat, 13 Feb 2021 19:29:40 +0100
Hi,

first:  Thank you very much for the work, I really owe you one!

On Sat, 2021-02-13 at 17:58 +0100, Erik Auerswald wrote:
> On 13.02.21 15:17, Erik Auerswald wrote:
> > On 11.02.21 20:20, Erik Auerswald wrote:
> > > On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis Robert
> > > König 
> > > wrote:
> > > > On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
> > > > > On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis
> > > > > Robert
> > > > > König wrote:
> > > > > > On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald wrote:
> > > > > > > On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis
> > > > > > > Robert
> > > > > > > König wrote:
> > > > > > > > I'm sorry if I this is not a bug but to be expected,
> > > > > > > > but I thnk
> > > > > > > > pr doesn't get the alignment of tabs in multicolumn
> > > > > > > > output
> > > > > > > > right.  [...]  This seems *kind* of related to multi-
> > > > > > > > column
> > > > > > > > merged output, as was discussed some years ago here:
> > > > > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
> > > > > > > >  
> > > > > > > > 
> > > > > > > 
> > > > > > > This thread contains the bug-introducing patch in message
> > > > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html
> > > > > > >  
> > > > > > > 
> > > > > > > 
> > > > > > > This is commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
> > > > > > 
> > > > > > ah, thanks for digging, I read the message but must have
> > > > > > missed
> > > > > > the patch.
> > > > > > 
> > > > > > > That commit removed the 'assume -e' part of the POSIX
> > > > > > > description
> > > > > > > of the -COLUMN option from GNU pr.
> > > > > > [...]
> > > > > Your test case requires expanding tabs during input, which is
> > > > > the reason that "expand | pr" could be used as a workaround
> > > > > (with
> > > > > "expand | pr | unexpand", pr would not need to mess with tabs
> > > > > at all,
> > > > > but I do think that GNU pr is currently buggy and should be
> > > > > fixed).
> > > > 
> > > > Absolutely, expand would be a workaround (I happen to use `pr -
> > > > e | pr`
> > > > in my script, for other reasons).
> > > > [...]
> > > I have found a fix to the problem described by you.  I am quite
> > > sure that
> > > this is not *correct*, but I did not find a way to make 
> > > print_sep_string()
> > > account for tabs that did not break quite a few existing tests,
> > > even if
> > > the merged files problem from 2007 and this columnating bug were
> > > both
> > > fixed.  Thus I just tighten the 2007 bug fix to apply in less
> > > cases.
> > > This way all existing tests pass, and a new one pertaining to
> > > this bug
> > > report passes, too.  I do think this is in the same spirit as the
> > > "fix"
> > > from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28).
> > 
> > I think the attached patch is a better fix than my previous one,
> > because it applies the special treatment of TAB as separator more
> > consistently.  It may still not be complete (the code seems quite
> > convoluted to me) but I do think it improves the situation
> > significantly, and does not make it worse.

Hm, I'm not sure whether I understand this special case.  When we have
a tab as column separator, doesn't this imply that the second column is
starting on a position n*8, (effectively equivalent to the first
column), thus guaranteeing that the alignment is honored?  So, if my
brain isn't completely off the track, with -s'\t' there shouldn't be a
difference whether we do -e -i or not, thus there needs not to be a
special case.

That being said, I don't see this exact distinction reflected in the
code, so perhaps I just misunderstood.

> 
> It seems to me as if "untabify_input = true;" should be re-introduced
> in one additional place to fix the regression from commit 553d347,
> please see the attached patch version 3.
> 
> > I'd like to ask the GNU Coreutils maintainers to consider merging
> > the attached patch.
> 
> The latest version, i.e., v3 for now.

I can only second this, with the patch my rather obscure (and complex)
use case of printing thousands of lines of code works properly now!

~leo





Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Sat, 13 Feb 2021 20:17:01 GMT) Full text and rfc822 format available.

Message #38 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Sat, 13 Feb 2021 21:15:56 +0100
Hi,

On 13.02.21 19:29, Leonard Janis Robert König wrote:
> 
> first:  Thank you very much for the work, I really owe you one!

You're welcome. :-)

> On Sat, 2021-02-13 at 17:58 +0100, Erik Auerswald wrote:
>> On 13.02.21 15:17, Erik Auerswald wrote:
>>> On 11.02.21 20:20, Erik Auerswald wrote:
>>>> On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis Robert
>>>> König
>>>> wrote:
>>>>> On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
>>>>>> On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis
>>>>>> Robert
>>>>>> König wrote:
>>>>>>> On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald wrote:
>>>>>>>> On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard Janis
>>>>>>>> Robert
>>>>>>>> König wrote:
>>>>>>>>> I'm sorry if I this is not a bug but to be expected,
>>>>>>>>> but I thnk
>>>>>>>>> pr doesn't get the alignment of tabs in multicolumn
>>>>>>>>> output
>>>>>>>>> right.  [...]  This seems *kind* of related to multi-
>>>>>>>>> column
>>>>>>>>> merged output, as was discussed some years ago here:
>>>>>>>>> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
>>>>>>>>
>>>>>>>> This thread contains the bug-introducing patch in message
>>>>>>>> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html
>>>>>>>>   
>>>>>>>> This is commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
>>>>>>>> That commit removed the 'assume -e' part of the POSIX
>>>>>>>> description
>>>>>>>> of the -COLUMN option from GNU pr.
>>>>>>> [...]
>>>> I have found a fix to the problem described by you.  I am quite
>>>> sure that
>>>> this is not *correct*, but I did not find a way to make
>>>> print_sep_string()
>>>> account for tabs that did not break quite a few existing tests,
>>>> even if
>>>> the merged files problem from 2007 and this columnating bug were
>>>> both
>>>> fixed.  Thus I just tighten the 2007 bug fix to apply in less
>>>> cases.
>>>> This way all existing tests pass, and a new one pertaining to
>>>> this bug
>>>> report passes, too.  I do think this is in the same spirit as the
>>>> "fix"
>>>> from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28).
>>>
>>> I think the attached patch is a better fix than my previous one,
>>> because it applies the special treatment of TAB as separator more
>>> consistently.  It may still not be complete (the code seems quite
>>> convoluted to me) but I do think it improves the situation
>>> significantly, and does not make it worse.
> 
> Hm, I'm not sure whether I understand this special case.  When we have
> a tab as column separator, doesn't this imply that the second column is
> starting on a position n*8, (effectively equivalent to the first
> column), thus guaranteeing that the alignment is honored?  So, if my

Whatever the reason (perhaps conforming to POSIX, perhaps other pr
implementations doing the same), GNU pr implements a special
treatment for TAB as column separator, and the thread from 2007
implies that the pr from HP-UX does as well.

The POSIX spec says:

    "-s[char]

     Separate text columns by the single character char instead of
     by the appropriate number of <space> characters (default for
     char shall be <tab>)."

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html

So use of -s needs to always result in one separator character
between columns.

This is implemented by GNU pr, and seemingly by pr from HP-UX, too
(https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html)

Of all the printable ASCII characters, only TAB results in
interactions with "Tabification," i.e., turning TABs into spaces
on input and spaces into TABs on output.  Thus only TAB as
separator may require the special treatment of disabling
"Tabification."

Omitting this special treatment resulted in the bug from 2007.

Removing the implicit "-e" and "-i" from "-NUMBER" and "-m"
to fix the 2007 bug resulted in this bug (bug#46422), and does
not conform to the POSIX specification nor to the GNU pr info
documentation.

My v3 patch restricts this special treatment of "-s" to just
the cases where it is used without specifying a separator and
thus using the default of TAB, or when it is used with a single
TAB ("-s$'\t'").  Thus it restricts the 2007 change from commit
553d347d3e08e00ee4f9df520b37c964c3f26e28 to affect only those
use cases it should affect, instead of all multi-column use cases.

It may be possible to add some appropriate special treatment for
TAB as separator without disabling "Tabification."  But I do not
know how.  Just accounting for the output position change resulting
from printing a TAB in print_sep_string() does not work, i.e.,
breaks many of the existing tests.

> [...]
> That being said, I don't see this exact distinction reflected in the
> code, so perhaps I just misunderstood.

Disabling "Tabification" only when "-s" was active is missing.  That
resulted in the 2007 bug.  Making the needed special treatment always
used fixed the 2007 bug, but broke your use case.

That some special treatment is needed and intended can be gleaned
from the following comment (with line numbers from pr.c in the
current master branch @ 2de30c7350a77b091afa1eb284acdf082c0f6aa5):

1031  /* It's rather pointless to define a TAB separator with column
1032     alignment */

My patch adds the special treatment, since it works both for the 2007
bug and this bug (bug#46422).

>> It seems to me as if "untabify_input = true;" should be re-introduced
>> in one additional place to fix the regression from commit 553d347,
>> please see the attached patch version 3.
>>
>>> I'd like to ask the GNU Coreutils maintainers to consider merging
>>> the attached patch.
>>
>> The latest version, i.e., v3 for now.
> 
> I can only second this, with the patch my rather obscure (and complex)
> use case of printing thousands of lines of code works properly now!

Thanks for testing!

Thanks,
Erik





Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Sat, 13 Feb 2021 20:29:02 GMT) Full text and rfc822 format available.

Message #41 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Leonard Janis Robert König <ljrk <at> ljrk.org>
To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Sat, 13 Feb 2021 21:28:49 +0100
On Sat, 2021-02-13 at 21:15 +0100, Erik Auerswald wrote:
> Hi,
> 
> On 13.02.21 19:29, Leonard Janis Robert König wrote:
> > 
> > first:  Thank you very much for the work, I really owe you one!
> 
> You're welcome. :-)
> 
> > On Sat, 2021-02-13 at 17:58 +0100, Erik Auerswald wrote:
> > > On 13.02.21 15:17, Erik Auerswald wrote:
> > > > On 11.02.21 20:20, Erik Auerswald wrote:
> > > > > On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis
> > > > > Robert
> > > > > König
> > > > > wrote:
> > > > > > On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote:
> > > > > > > On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis
> > > > > > > Robert
> > > > > > > König wrote:
> > > > > > > > On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald
> > > > > > > > wrote:
> > > > > > > > > On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard
> > > > > > > > > Janis
> > > > > > > > > Robert
> > > > > > > > > König wrote:
> > > > > > > > > > I'm sorry if I this is not a bug but to be
> > > > > > > > > > expected,
> > > > > > > > > > but I thnk
> > > > > > > > > > pr doesn't get the alignment of tabs in multicolumn
> > > > > > > > > > output
> > > > > > > > > > right.  [...]  This seems *kind* of related to
> > > > > > > > > > multi-
> > > > > > > > > > column
> > > > > > > > > > merged output, as was discussed some years ago
> > > > > > > > > > here:
> > > > > > > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
> > > > > > > > > 
> > > > > > > > > This thread contains the bug-introducing patch in
> > > > > > > > > message
> > > > > > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html
> > > > > > > > >   
> > > > > > > > > This is commit
> > > > > > > > > 553d347d3e08e00ee4f9df520b37c964c3f26e28.
> > > > > > > > > That commit removed the 'assume -e' part of the POSIX
> > > > > > > > > description
> > > > > > > > > of the -COLUMN option from GNU pr.
> > > > > > > > [...]
> > > > > I have found a fix to the problem described by you.  I am
> > > > > quite
> > > > > sure that
> > > > > this is not *correct*, but I did not find a way to make
> > > > > print_sep_string()
> > > > > account for tabs that did not break quite a few existing
> > > > > tests,
> > > > > even if
> > > > > the merged files problem from 2007 and this columnating bug
> > > > > were
> > > > > both
> > > > > fixed.  Thus I just tighten the 2007 bug fix to apply in less
> > > > > cases.
> > > > > This way all existing tests pass, and a new one pertaining to
> > > > > this bug
> > > > > report passes, too.  I do think this is in the same spirit as
> > > > > the
> > > > > "fix"
> > > > > from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28).
> > > > 
> > > > I think the attached patch is a better fix than my previous
> > > > one,
> > > > because it applies the special treatment of TAB as separator
> > > > more
> > > > consistently.  It may still not be complete (the code seems
> > > > quite
> > > > convoluted to me) but I do think it improves the situation
> > > > significantly, and does not make it worse.
> > 
> > Hm, I'm not sure whether I understand this special case.  When we
> > have
> > a tab as column separator, doesn't this imply that the second
> > column is
> > starting on a position n*8, (effectively equivalent to the first
> > column), thus guaranteeing that the alignment is honored?  So, if
> > my
> 
> Whatever the reason (perhaps conforming to POSIX, perhaps other pr
> implementations doing the same), GNU pr implements a special
> treatment for TAB as column separator, and the thread from 2007
> implies that the pr from HP-UX does as well.
> 
> The POSIX spec says:
> 
>      "-s[char]
> 
>       Separate text columns by the single character char instead of
>       by the appropriate number of <space> characters (default for
>       char shall be <tab>)."
> 
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html
> 
> So use of -s needs to always result in one separator character
> between columns.
> 
> This is implemented by GNU pr, and seemingly by pr from HP-UX, too
> (
> https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html
> )
> 
> Of all the printable ASCII characters, only TAB results in
> interactions with "Tabification," i.e., turning TABs into spaces
> on input and spaces into TABs on output.  Thus only TAB as
> separator may require the special treatment of disabling
> "Tabification."
> 
> Omitting this special treatment resulted in the bug from 2007.
> 
> Removing the implicit "-e" and "-i" from "-NUMBER" and "-m"
> to fix the 2007 bug resulted in this bug (bug#46422), and does
> not conform to the POSIX specification nor to the GNU pr info
> documentation.
> 
> My v3 patch restricts this special treatment of "-s" to just
> the cases where it is used without specifying a separator and
> thus using the default of TAB, or when it is used with a single
> TAB ("-s$'\t'").  Thus it restricts the 2007 change from commit
> 553d347d3e08e00ee4f9df520b37c964c3f26e28 to affect only those
> use cases it should affect, instead of all multi-column use cases.
> 
> It may be possible to add some appropriate special treatment for
> TAB as separator without disabling "Tabification."  But I do not
> know how.  Just accounting for the output position change resulting
> from printing a TAB in print_sep_string() does not work, i.e.,
> breaks many of the existing tests.

This makes sense then, yes.  In theory, this'd be the best approach,
but I don't see that happening with the current code base... .  The
issue is less with enabling/disabling tabification, since a tab
character should, as I wrote, always align the second column -- but
with the rather weird behavior of pr sometimes "miscounting" that I
outlined a few mails earlier, after which I gave up fixing :-)

> 
> > [...]
> > That being said, I don't see this exact distinction reflected in
> > the
> > code, so perhaps I just misunderstood.
> 
> Disabling "Tabification" only when "-s" was active is missing.  That
> resulted in the 2007 bug.  Making the needed special treatment always
> used fixed the 2007 bug, but broke your use case.
> 
> That some special treatment is needed and intended can be gleaned
> from the following comment (with line numbers from pr.c in the
> current master branch @ 2de30c7350a77b091afa1eb284acdf082c0f6aa5):
> 
> 1031  /* It's rather pointless to define a TAB separator with column
> 1032     alignment */
> 
> My patch adds the special treatment, since it works both for the 2007
> bug and this bug (bug#46422).
> 
> > > It seems to me as if "untabify_input = true;" should be re-
> > > introduced
> > > in one additional place to fix the regression from commit
> > > 553d347,
> > > please see the attached patch version 3.
> > > 
> > > > I'd like to ask the GNU Coreutils maintainers to consider
> > > > merging
> > > > the attached patch.
> > > 
> > > The latest version, i.e., v3 for now.
> > 
> > I can only second this, with the patch my rather obscure (and
> > complex)
> > use case of printing thousands of lines of code works properly now!
> 
> Thanks for testing!

Thanks all to you

~leo






Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Sun, 14 Feb 2021 19:23:02 GMT) Full text and rfc822 format available.

Message #44 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Sun, 14 Feb 2021 20:22:28 +0100
[Message part 1 (text/plain, inline)]
Hi,

On 13.02.21 21:28, Leonard Janis Robert König wrote:
> On Sat, 2021-02-13 at 21:15 +0100, Erik Auerswald wrote:
>> On 13.02.21 19:29, Leonard Janis Robert König wrote:
>>> [...]
>>> That being said, I don't see this exact distinction reflected in
>>> the
>>> code, so perhaps I just misunderstood.
>>
>> Disabling "Tabification" only when "-s" was active is missing.  That
>> resulted in the 2007 bug.  Making the needed special treatment always
>> used fixed the 2007 bug, but broke your use case.
>>
>> That some special treatment is needed and intended can be gleaned
>> from the following comment (with line numbers from pr.c in the
>> current master branch @ 2de30c7350a77b091afa1eb284acdf082c0f6aa5):
>>
>> 1031  /* It's rather pointless to define a TAB separator with column
>> 1032     alignment */

The code after that comment does not disable alignment, but changes
the separator from a TAB to a space.

>> My patch adds the special treatment, since it works both for the 2007
>> bug and this bug (bug#46422).

The attached version 4 of my patch does that in a way that more
clearly shows the intent.  I think this is a better fix for the
2007 bug than commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
Expanding TABs on input is enabled unless when a single TAB is
used as column separator.  This conforms better to POSIX and
does not introduce the regression that causes the current bug
(bug#46422).

I have added more test cases, because manual testing showed that
the options "-s" and "-s$'\t'" were treated differently by pr.

Using "-s" to activate the default TAB separator should result
in the same output as using "-s$'\t'" to specify one TAB character
as separator, i.e., the default, explicitly.

>>> [...] with the patch my rather obscure (and complex)
>>> use case of printing thousands of lines of code works properly now!
>>
>> Thanks for testing!
> 
> Thanks all to you

May I ask you to test the new patch (v4) as well?

Thanks,
Erik
[coreutils-pr-fix_bug_46422.v4.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Sun, 14 Feb 2021 19:35:01 GMT) Full text and rfc822 format available.

Message #47 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Leonard Janis Robert König <ljrk <at> ljrk.org>
To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Sun, 14 Feb 2021 20:34:00 +0100
On Sun, 2021-02-14 at 20:22 +0100, Erik Auerswald wrote:
> Hi,
> 
> On 13.02.21 21:28, Leonard Janis Robert König wrote:
> > On Sat, 2021-02-13 at 21:15 +0100, Erik Auerswald wrote:
> > > On 13.02.21 19:29, Leonard Janis Robert König wrote:
> > > > [...]
> > > > That being said, I don't see this exact distinction reflected
> > > > in
> > > > the
> > > > code, so perhaps I just misunderstood.
> > > 
> > > Disabling "Tabification" only when "-s" was active is missing. 
> > > That
> > > resulted in the 2007 bug.  Making the needed special treatment
> > > always
> > > used fixed the 2007 bug, but broke your use case.
> > > 
> > > That some special treatment is needed and intended can be gleaned
> > > from the following comment (with line numbers from pr.c in the
> > > current master branch @
> > > 2de30c7350a77b091afa1eb284acdf082c0f6aa5):
> > > 
> > > 1031  /* It's rather pointless to define a TAB separator with
> > > column
> > > 1032     alignment */
> 
> The code after that comment does not disable alignment, but changes
> the separator from a TAB to a space.
> 
> > > My patch adds the special treatment, since it works both for the
> > > 2007
> > > bug and this bug (bug#46422).
> 
> The attached version 4 of my patch does that in a way that more
> clearly shows the intent.  I think this is a better fix for the
> 2007 bug than commit 553d347d3e08e00ee4f9df520b37c964c3f26e28.
> Expanding TABs on input is enabled unless when a single TAB is
> used as column separator.  This conforms better to POSIX and
> does not introduce the regression that causes the current bug
> (bug#46422).
> 
> I have added more test cases, because manual testing showed that
> the options "-s" and "-s$'\t'" were treated differently by pr.
> 
> Using "-s" to activate the default TAB separator should result
> in the same output as using "-s$'\t'" to specify one TAB character
> as separator, i.e., the default, explicitly.
> 
> > > > [...] with the patch my rather obscure (and complex)
> > > > use case of printing thousands of lines of code works properly
> > > > now!
> > > 
> > > Thanks for testing!
> > 
> > Thanks all to you
> 
> May I ask you to test the new patch (v4) as well?

Sure!  I got identical output compared to the previous version (I don't
use `-s`) so it doesn't seem to break anything, but fixes the bug as
intended.

Thanks again!

~leo






Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Sun, 14 Feb 2021 23:05:01 GMT) Full text and rfc822 format available.

Message #50 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>,
 Leonard Janis Robert König <ljrk <at> ljrk.org>
Cc: 46422 <at> debbugs.gnu.org
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Sun, 14 Feb 2021 23:04:21 +0000
On 14/02/2021 19:22, Erik Auerswald wrote:
> May I ask you to test the new patch (v4) as well?

This version looks good.
I'll probably apply this after a little more local testing.

Thanks to both of you!
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#46422; Package coreutils. (Mon, 15 Feb 2021 07:20:02 GMT) Full text and rfc822 format available.

Message #53 received at 46422 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 46422 <at> debbugs.gnu.org,
 Leonard Janis Robert König <ljrk <at> ljrk.org>
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Mon, 15 Feb 2021 08:19:43 +0100
On Sun, Feb 14, 2021 at 11:04:21PM +0000, Pádraig Brady wrote:
> On 14/02/2021 19:22, Erik Auerswald wrote:
> >May I ask you to test the new patch (v4) as well?
> 
> This version looks good.
> I'll probably apply this after a little more local testing.

Thanks!




Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Mon, 15 Feb 2021 21:34:01 GMT) Full text and rfc822 format available.

Notification sent to Leonard Janis Robert König <ljrk <at> ljrk.org>:
bug acknowledged by developer. (Mon, 15 Feb 2021 21:34:02 GMT) Full text and rfc822 format available.

Message #58 received at 46422-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
Cc: Leonard Janis Robert König <ljrk <at> ljrk.org>,
 46422-done <at> debbugs.gnu.org
Subject: Re: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in
 multicolumn outpt?
Date: Mon, 15 Feb 2021 21:33:03 +0000
On 15/02/2021 07:19, Erik Auerswald wrote:
> On Sun, Feb 14, 2021 at 11:04:21PM +0000, Pádraig Brady wrote:
>> On 14/02/2021 19:22, Erik Auerswald wrote:
>>> May I ask you to test the new patch (v4) as well?
>>
>> This version looks good.
>> I'll probably apply this after a little more local testing.

Pushed at:
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v8.32-108-gbd6c97dee

Marking this as done.

thanks again,
Pádraig




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 16 Mar 2021 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 99 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.