Package: coreutils;
Reported by: Leonard Janis Robert König <ljrk <at> ljrk.org>
Date: Wed, 10 Feb 2021 13:21:02 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Leonard Janis Robert König <ljrk <at> ljrk.org> To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de> Cc: 46422 <at> debbugs.gnu.org Subject: bug#46422: [PATCH] Re: bug#46422: 'pr' screws up tabstops in multicolumn outpt? Date: Sat, 13 Feb 2021 21:28:49 +0100
On Sat, 2021-02-13 at 21:15 +0100, Erik Auerswald wrote: > Hi, > > On 13.02.21 19:29, Leonard Janis Robert König wrote: > > > > first: Thank you very much for the work, I really owe you one! > > You're welcome. :-) > > > On Sat, 2021-02-13 at 17:58 +0100, Erik Auerswald wrote: > > > On 13.02.21 15:17, Erik Auerswald wrote: > > > > On 11.02.21 20:20, Erik Auerswald wrote: > > > > > On Thu, Feb 11, 2021 at 06:09:28PM +0100, Leonard Janis > > > > > Robert > > > > > König > > > > > wrote: > > > > > > On Thu, 2021-02-11 at 16:45 +0100, Erik Auerswald wrote: > > > > > > > On Thu, Feb 11, 2021 at 04:12:54PM +0100, Leonard Janis > > > > > > > Robert > > > > > > > König wrote: > > > > > > > > On Thu, 2021-02-11 at 13:00 +0100, Erik Auerswald > > > > > > > > wrote: > > > > > > > > > On Wed, Feb 10, 2021 at 01:42:29PM +0100, Leonard > > > > > > > > > Janis > > > > > > > > > Robert > > > > > > > > > König wrote: > > > > > > > > > > I'm sorry if I this is not a bug but to be > > > > > > > > > > expected, > > > > > > > > > > but I thnk > > > > > > > > > > pr doesn't get the alignment of tabs in multicolumn > > > > > > > > > > output > > > > > > > > > > right. [...] This seems *kind* of related to > > > > > > > > > > multi- > > > > > > > > > > column > > > > > > > > > > merged output, as was discussed some years ago > > > > > > > > > > here: > > > > > > > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html > > > > > > > > > > > > > > > > > > This thread contains the bug-introducing patch in > > > > > > > > > message > > > > > > > > > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00160.html > > > > > > > > > > > > > > > > > > This is commit > > > > > > > > > 553d347d3e08e00ee4f9df520b37c964c3f26e28. > > > > > > > > > That commit removed the 'assume -e' part of the POSIX > > > > > > > > > description > > > > > > > > > of the -COLUMN option from GNU pr. > > > > > > > > [...] > > > > > I have found a fix to the problem described by you. I am > > > > > quite > > > > > sure that > > > > > this is not *correct*, but I did not find a way to make > > > > > print_sep_string() > > > > > account for tabs that did not break quite a few existing > > > > > tests, > > > > > even if > > > > > the merged files problem from 2007 and this columnating bug > > > > > were > > > > > both > > > > > fixed. Thus I just tighten the 2007 bug fix to apply in less > > > > > cases. > > > > > This way all existing tests pass, and a new one pertaining to > > > > > this bug > > > > > report passes, too. I do think this is in the same spirit as > > > > > the > > > > > "fix" > > > > > from 2007 (commit 553d347d3e08e00ee4f9df520b37c964c3f26e28). > > > > > > > > I think the attached patch is a better fix than my previous > > > > one, > > > > because it applies the special treatment of TAB as separator > > > > more > > > > consistently. It may still not be complete (the code seems > > > > quite > > > > convoluted to me) but I do think it improves the situation > > > > significantly, and does not make it worse. > > > > Hm, I'm not sure whether I understand this special case. When we > > have > > a tab as column separator, doesn't this imply that the second > > column is > > starting on a position n*8, (effectively equivalent to the first > > column), thus guaranteeing that the alignment is honored? So, if > > my > > Whatever the reason (perhaps conforming to POSIX, perhaps other pr > implementations doing the same), GNU pr implements a special > treatment for TAB as column separator, and the thread from 2007 > implies that the pr from HP-UX does as well. > > The POSIX spec says: > > "-s[char] > > Separate text columns by the single character char instead of > by the appropriate number of <space> characters (default for > char shall be <tab>)." > > https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pr.html > > So use of -s needs to always result in one separator character > between columns. > > This is implemented by GNU pr, and seemingly by pr from HP-UX, too > ( > https://lists.gnu.org/archive/html/bug-coreutils/2007-03/msg00121.html > ) > > Of all the printable ASCII characters, only TAB results in > interactions with "Tabification," i.e., turning TABs into spaces > on input and spaces into TABs on output. Thus only TAB as > separator may require the special treatment of disabling > "Tabification." > > Omitting this special treatment resulted in the bug from 2007. > > Removing the implicit "-e" and "-i" from "-NUMBER" and "-m" > to fix the 2007 bug resulted in this bug (bug#46422), and does > not conform to the POSIX specification nor to the GNU pr info > documentation. > > My v3 patch restricts this special treatment of "-s" to just > the cases where it is used without specifying a separator and > thus using the default of TAB, or when it is used with a single > TAB ("-s$'\t'"). Thus it restricts the 2007 change from commit > 553d347d3e08e00ee4f9df520b37c964c3f26e28 to affect only those > use cases it should affect, instead of all multi-column use cases. > > It may be possible to add some appropriate special treatment for > TAB as separator without disabling "Tabification." But I do not > know how. Just accounting for the output position change resulting > from printing a TAB in print_sep_string() does not work, i.e., > breaks many of the existing tests. This makes sense then, yes. In theory, this'd be the best approach, but I don't see that happening with the current code base... . The issue is less with enabling/disabling tabification, since a tab character should, as I wrote, always align the second column -- but with the rather weird behavior of pr sometimes "miscounting" that I outlined a few mails earlier, after which I gave up fixing :-) > > > [...] > > That being said, I don't see this exact distinction reflected in > > the > > code, so perhaps I just misunderstood. > > Disabling "Tabification" only when "-s" was active is missing. That > resulted in the 2007 bug. Making the needed special treatment always > used fixed the 2007 bug, but broke your use case. > > That some special treatment is needed and intended can be gleaned > from the following comment (with line numbers from pr.c in the > current master branch @ 2de30c7350a77b091afa1eb284acdf082c0f6aa5): > > 1031 /* It's rather pointless to define a TAB separator with column > 1032 alignment */ > > My patch adds the special treatment, since it works both for the 2007 > bug and this bug (bug#46422). > > > > It seems to me as if "untabify_input = true;" should be re- > > > introduced > > > in one additional place to fix the regression from commit > > > 553d347, > > > please see the attached patch version 3. > > > > > > > I'd like to ask the GNU Coreutils maintainers to consider > > > > merging > > > > the attached patch. > > > > > > The latest version, i.e., v3 for now. > > > > I can only second this, with the patch my rather obscure (and > > complex) > > use case of printing thousands of lines of code works properly now! > > Thanks for testing! Thanks all to you ~leo
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.