GNU bug report logs - #29606
Command 'fold' dangerous with utf-8 input

Previous Next

Package: coreutils;

Reported by: Mark Roberts <mroberts <at> rapid-arts-movement.de>

Date: Thu, 7 Dec 2017 16:27:02 UTC

Severity: normal

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mark Roberts <mroberts <at> rapid-arts-movement.de>
To: bug-coreutils <at> gnu.org
Subject: Command 'fold' dangerous with utf-8 input
Date: Thu, 7 Dec 2017 11:10:02 +0100 (CET)
Dear maintainers,

I am using fold version 8.13 on a Debian 3.2.93-1

> cat filename | fold

If 'filename' contains utf8 characters consisting of more than one byte, 
fold will consider breaking the line inside such a character. There is no 
option to stop it doing that.

Except, of course "-s": break at spaces. But that may not be what the user 
wants.

According to man-page, it counts columns by default, not bytes. This seems 
not to be true. The switch "-b": count bytes, has no influence on the 
output in my test case.

How to fix this?

I presume that either (1) the default behavior (counting columns) is not 
what I expect, namely to count characters instead of bytes. This would 
have to be clarified in man-page.

or (2) that the default isn't what the man-page says it is: possibly the 
default set in the code is to count bytes. This would be an error.

or (3) that 'fold' fails to read my "LANG" environment variable which 
clearly states a UTF-8 locale. This, in 2017, is an error.


Please write back to mroberts <at> rapid-arts-movement.de if you need example 
data or clarifications.

Thank you,
Mark Roberts




This bug report was last modified 7 years and 169 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.