GNU bug report logs - #7372
multibyte: fmt and multi-byte encodings

Previous Next

Package: coreutils;

Reported by: Ineiev <ineiev <at> gmail.com>

Date: Thu, 11 Nov 2010 09:42:03 UTC

Severity: wishlist

To reply to this bug, email your comments to 7372 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7372; Package coreutils. (Thu, 11 Nov 2010 09:42:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ineiev <ineiev <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 11 Nov 2010 09:42:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ineiev <ineiev <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: fmt and multi-byte encodings
Date: Thu, 11 Nov 2010 13:32:43 +0400
Hello;

Today I fed a text in Russian in UTF-8 to fmt
and discovered that the utility counts the line width
in bytes rather than in characters (the lines written in
Cyrillics were roughly twice as short as the lines
written in Latin script), which was not what I wanted.
I checked fmt from coreutils-8.6.

As a workaround, I could iconv the text into a single-byte
encoding like KOI8-R, but I would limit the character
set then.

I've never used fmt before personally, so actually I'm not
sure whether it was a bug or I did something wrong.

Any hints?




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7372; Package coreutils. (Thu, 11 Nov 2010 15:57:01 GMT) Full text and rfc822 format available.

Message #8 received at 7372 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Ineiev <ineiev <at> gmail.com>
Cc: 7372 <at> debbugs.gnu.org
Subject: Re: bug#7372: fmt and multi-byte encodings
Date: Thu, 11 Nov 2010 16:01:02 +0000
On 11/11/10 09:32, Ineiev wrote:
> Hello;
> 
> Today I fed a text in Russian in UTF-8 to fmt
> and discovered that the utility counts the line width
> in bytes rather than in characters (the lines written in
> Cyrillics were roughly twice as short as the lines
> written in Latin script), which was not what I wanted.
> I checked fmt from coreutils-8.6.
> 
> As a workaround, I could iconv the text into a single-byte
> encoding like KOI8-R, but I would limit the character
> set then.
> 
> I've never used fmt before personally, so actually I'm not
> sure whether it was a bug or I did something wrong.
> 
> Any hints?

We're starting to apply multi-byte support,
so hopefully this will be fixed soon.

$ echo "1 2 æ 4 5 6" | fmt -w6
1 2
æ 4
5 6

That is with the official fedora
version of `fmt`

cheers,
Pádraig





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7372; Package coreutils. (Fri, 12 Nov 2010 07:09:02 GMT) Full text and rfc822 format available.

Message #11 received at 7372 <at> debbugs.gnu.org (full text, mbox):

From: Ineiev <ineiev <at> gmail.com>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 7372 <at> debbugs.gnu.org
Subject: Re: bug#7372: fmt and multi-byte encodings
Date: Fri, 12 Nov 2010 11:13:20 +0400
On 11/11/10, Pádraig Brady <P <at> draigbrady.com> wrote:
> We're starting to apply multi-byte support,
> so hopefully this will be fixed soon.

Could you provide a link?

> $ echo "1 2 æ 4 5 6" | fmt -w6
> 1 2
> æ 4
> 5 6

While should be
1 2 æ
4 5 6

> That is with the official fedora
> version of `fmt`

I can confirm that fmt from official GNU coreutils-8.6 does the same.

Thanks,
Ineiev




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7372; Package coreutils. (Fri, 12 Nov 2010 09:57:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#7372: fmt and multi-byte encodings
Date: Fri, 12 Nov 2010 11:00:41 +0100
On Fri, Nov 12, 2010 at 11:13:20AM +0400, Ineiev wrote:
> On 11/11/10, Pádraig Brady <P <at> draigbrady.com> wrote:
> > We're starting to apply multi-byte support,
> > so hopefully this will be fixed soon.
> 
> Could you provide a link?
> 
> > $ echo "1 2 æ 4 5 6" | fmt -w6
> > 1 2
> > æ 4
> > 5 6
> 
> While should be
> 1 2 æ
> 4 5 6
> 
> > That is with the official fedora
> > version of `fmt`
> 
> I can confirm that fmt from official GNU coreutils-8.6 does the same.

The same with coreutils 6.10 on debian Lenny:

$ echo "1 2 3 4 5 6" | fmt -w6
1 2 3
4 5 6
$ echo "1 2 ü 4 5 6" | fmt -w6
1 2
ü 4
5 6
$ fmt --version
fmt (GNU coreutils) 6.10
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Ross Paterson.
$

(Just as a reference.)

Erik
-- 
But heck, system administration is hard, what's a little more rope?
Here, hold this gun while I position your foot...
                        -- Valerie Aurora




Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 15 Oct 2018 17:25:01 GMT) Full text and rfc822 format available.

Changed bug title to 'multibyte: fmt and multi-byte encodings' from 'fmt and multi-byte encodings' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 15 Oct 2018 17:25:01 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 254 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.