GNU bug report logs - #7085
fmt (GNU coreutils) 6.10

Previous Next

Package: coreutils;

Reported by: "Denis M. Wilson" <dmw <at> oxytropis.plus.com>

Date: Wed, 22 Sep 2010 20:23:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 7085 in the body.
You can then email your comments to 7085 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7085; Package coreutils. (Wed, 22 Sep 2010 20:23:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Denis M. Wilson" <dmw <at> oxytropis.plus.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 22 Sep 2010 20:23:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Denis M. Wilson" <dmw <at> oxytropis.plus.com>
To: bug-coreutils <at> gnu.org
Subject: fmt (GNU coreutils) 6.10
Date: Wed, 22 Sep 2010 20:59:06 +0100
This program does not deal properly with CRLF terminators.
Gratuitous CRs are left in joined lines; they should be
removed. The user may want to keep the CRLF style or change
to Unix (LF). There should be an option for this.

Denis M. Wilson

-- 




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7085; Package coreutils. (Wed, 22 Sep 2010 21:06:02 GMT) Full text and rfc822 format available.

Message #8 received at 7085 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: "Denis M. Wilson" <dmw <at> oxytropis.plus.com>
Cc: 7085 <at> debbugs.gnu.org
Subject: Re: bug#7085: fmt (GNU coreutils) 6.10
Date: Wed, 22 Sep 2010 15:08:01 -0600
On 09/22/2010 01:59 PM, Denis M. Wilson wrote:
> This program does not deal properly with CRLF terminators.
> Gratuitous CRs are left in joined lines; they should be
> removed. The user may want to keep the CRLF style or change
> to Unix (LF). There should be an option for this.

Thanks for the report.

POSIX is clear that CR is data and not a line terminator, so the 
existing behavior (in the absence of any new command-line option) is 
correct.

Now, we have a philosphy question - is it better to teach fmt a new 
command line option, and then have to wait for a new enough coreutils 
installation to propagate to various machines where you plan on using 
that extension, or is it better to use existing tools that are already 
portable according to POSIX and do the job now?  If you answered this 
question the same as me, then a solution that works now is favorable, so 
it becomes a question of massaging your files to get rid of the CR 
characters prior to using fmt on the file.  Something as simple as:

tr -d '\r' < file > file.out && mv file.out file

will do the trick.  Other common wrappers, like dos2unix, exist to make 
this sanitization process even easier.  So, I'm reluctant to add such a 
feature myself, and it will take some pretty strong arguments (such as 
existing practice in other fmt implementations) to convince me that a 
new option is worthwhile.

Meanwhile, you may want to consider upgrading to a newer coreutils; the 
latest stable version is 8.5, with a number of bug fixes in various tools.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7085; Package coreutils. (Mon, 27 Sep 2010 22:24:01 GMT) Full text and rfc822 format available.

Message #11 received at 7085 <at> debbugs.gnu.org (full text, mbox):

From: William Plusnick <pwplusnick2 <at> gmail.com>
To: "Denis M. Wilson" <dmw <at> oxytropis.plus.com>
Cc: 7085 <at> debbugs.gnu.org
Subject: Re: bug#7085: fmt (GNU coreutils) 6.10
Date: Mon, 27 Sep 2010 17:26:09 -0500
[Message part 1 (text/plain, inline)]
On Wed, Sep 22, 2010 at 2:59 PM, Denis M. Wilson <dmw <at> oxytropis.plus.com>wrote:

> This program does not deal properly with CRLF terminators.
> Gratuitous CRs are left in joined lines; they should be
> removed. The user may want to keep the CRLF style or change
> to Unix (LF). There should be an option for this.
>
> Denis M. Wilson
>
> --
>
>
>
> Here is a patch that I wrote because I was bored today (I finished all my
homework this weekend :^p) that will remove CRs from lines ending in CRLF. I
must warn you that I didn't add documentation to the --help (and by
implication the man page) or to the texi files. I did this to minimize the
number of changes, so as to become less likely to conflict with future
commits, since this is most likely not going into the mainstream repository.
You invoke it like this:
fmt -d [file]

or alternatively:
fmt --dos [file]

The reason I sent this to everyone and not just Denis Wilson is that I
wanted to be able to point to it in the future if people want this feature
and are willing to risk future incompatibility.

I believe it works, though I haven't tested it too throughly. Though
Valgrind doesn't complain and it seems to do the job. (I downloaded a file
written on MS-DOS and it ran it through via: 'fmt -d file.txt' and it works
on it.)

Hope this helps someone,
William
From 12a2bee879e3c803f872fe1960a1dedaed485d10 Mon Sep 17 00:00:00 2001
From: Patrick W. Plusnick II <pwplusnick2 <at> gmail.com>
Date: Mon, 27 Sep 2010 16:57:06 -0500
Subject: [PATCH] fmt: added the -d option so that it removes Carriage
Returns from MS-DOS files

* src/fmt.c: simply removes the Carriage Returns from lines ending with
CRLF.
---
 src/fmt.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/fmt.c b/src/fmt.c
index 8a5d8bd..9150f43 100644
--- a/src/fmt.c
+++ b/src/fmt.c
@@ -173,6 +173,8 @@ static void put_space (int space);
 /* If true, first 2 lines may have different indent (default false).  */
 static bool crown;

+/* If true, Removes the CR out of CRLFs. Mainly for MS-DOS files */
+static bool trunc_crlf;
 /* If true, first 2 lines _must_ have different indent (default false).  */
 static bool tagged;

@@ -304,6 +306,7 @@ With no FILE, or when FILE is -, read standard
input.\n"),
 static struct option const long_options[] =
 {
   {"crown-margin", no_argument, NULL, 'c'},
+  {"dos", no_argument, NULL, 'd'},
   {"prefix", required_argument, NULL, 'p'},
   {"split-only", no_argument, NULL, 's'},
   {"tagged-paragraph", no_argument, NULL, 't'},
@@ -329,7 +332,7 @@ main (int argc, char **argv)

   atexit (close_stdout);

-  crown = tagged = split = uniform = false;
+  crown = trunc_crlf = tagged = split = uniform = false;
   max_width = WIDTH;
   prefix = "";
   prefix_length = prefix_lead_space = prefix_full_length = 0;
@@ -345,7 +348,7 @@ main (int argc, char **argv)
       argc--;
     }

-  while ((optchar = getopt_long (argc, argv, "0123456789cstuw:p:",
+  while ((optchar = getopt_long (argc, argv, "0123456789cdstuw:p:",
                                  long_options, NULL))
          != -1)
     switch (optchar)
@@ -361,6 +364,10 @@ main (int argc, char **argv)
         crown = true;
         break;

+      case 'd':
+        trunc_crlf = true;
+    break;
+
       case 's':
         split = true;
         break;
@@ -691,6 +698,8 @@ get_line (FILE *f, int c)
       word_limit++;
     }
   while (c != '\n' && c != EOF);
+  if (c == '\n' && *(wptr-1) == '\r' && trunc_crlf)
+    *--wptr = '\n';
   return get_prefix (f);
 }

-- 
1.7.0.4
[Message part 2 (text/html, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7085; Package coreutils. (Mon, 27 Sep 2010 23:21:02 GMT) Full text and rfc822 format available.

Message #14 received at 7085 <at> debbugs.gnu.org (full text, mbox):

From: William Plusnick <pwplusnick2 <at> gmail.com>
To: "Denis M. Wilson" <dmw <at> oxytropis.plus.com>
Cc: 7085 <at> debbugs.gnu.org
Subject: Re: bug#7085: fmt (GNU coreutils) 6.10
Date: Mon, 27 Sep 2010 18:23:44 -0500
[Message part 1 (text/plain, inline)]
>
> I believe it works, though I haven't tested it too throughly. Though
> Valgrind doesn't complain and it seems to do the job. (I downloaded a file
> written on MS-DOS and it ran it through via: 'fmt -d file.txt' and it works
> on it.)
>

Well this is a bit embarrassing, it appears that my patch does indeed change
the output a bit that was subtle in the particular file I was working with
and it is obvious as to why now. Oh well, that is what I get for not testing
much.

So it is indeed a broken, unofficial feature.

I'll try to fix it,
William
[Message part 2 (text/html, inline)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7085; Package coreutils. (Tue, 28 Sep 2010 03:26:01 GMT) Full text and rfc822 format available.

Message #17 received at 7085 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: William Plusnick <pwplusnick2 <at> gmail.com>
Cc: 7085 <at> debbugs.gnu.org, "Denis M. Wilson" <dmw <at> oxytropis.plus.com>
Subject: Re: bug#7085: fmt (GNU coreutils) 6.10
Date: Mon, 27 Sep 2010 21:28:00 -0600
William Plusnick wrote:
> The reason I sent this to everyone and not just Denis Wilson is that I
> wanted to be able to point to it in the future if people want this feature
> and are willing to risk future incompatibility.

Isn't the more modular flow that Eric suggested more in keeping with
the Unix philosophy and the better solution?  If you want to do both
stripping of carriage returns and word wrapping together then you can
pipe them together.

  tr -d '\r' < somefile | fmt

Or if you are calling it as a stdin filter such as within an editor:

  tr -d '\r' | fmt

Bob




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7085; Package coreutils. (Wed, 29 Sep 2010 20:18:02 GMT) Full text and rfc822 format available.

Message #20 received at 7085 <at> debbugs.gnu.org (full text, mbox):

From: William Plusnick <pwplusnick2 <at> gmail.com>
To: Bob Proulx <bob <at> proulx.com>
Cc: 7085 <at> debbugs.gnu.org
Subject: Re: bug#7085: fmt (GNU coreutils) 6.10
Date: Wed, 29 Sep 2010 15:20:31 -0500
> Isn't the more modular flow that Eric suggested more in keeping with
> the Unix philosophy and the better solution?  If you want to do both
> stripping of carriage returns and word wrapping together then you can
> pipe them together.
>
>   tr -d '\r' < somefile | fmt
>
> Or if you are calling it as a stdin filter such as within an editor:
>
>   tr -d '\r' | fmt
>
> Bob
>
Yes, it is indeed. I don't know exactly why I chose to try (and
ultimately fail) writing that patch. I think the koan ESR wrote for
The Rootless Root: The Unix Koans Of Master Foo has some truth to it:
“There is more Unix-nature in one line of shell script than there is
in ten thousand lines of C.”

Finally in true koan fashion, "I am enlightened." :^)

William




Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Fri, 22 Jul 2011 22:13:01 GMT) Full text and rfc822 format available.

Notification sent to "Denis M. Wilson" <dmw <at> oxytropis.plus.com>:
bug acknowledged by developer. (Fri, 22 Jul 2011 22:13:01 GMT) Full text and rfc822 format available.

Message #25 received at 7085-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 7085-done <at> debbugs.gnu.org
Subject: Re: bug#7085: fmt (GNU coreutils) 6.10
Date: Sat, 23 Jul 2011 00:11:40 +0200
tags 7085 + notabug
close 7085
thanks

Denis M. Wilson wrote:
> This program does not deal properly with CRLF terminators.
> Gratuitous CRs are left in joined lines; they should be
> removed. The user may want to keep the CRLF style or change
> to Unix (LF). There should be an option for this.

As seen in the rest of this thread, this is not a bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 20 Aug 2011 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 308 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.